Background
During the edit process, Extension:ConfirmEdit temporarily stores a small captcha token. The retention is set to 30min, but in practice we delete it proactively after a few seconds once the captcha has been completed.
In 2019, as part of the MainStash audit for Multi-DC, we determined that Captcha data can remain in the MainStash interface and move along with the rest from MainStash Redis to MainStash DB.
In 2022, we added a new conceptual interface to MediaWiki in the WMF config, named mcrouter-primary-dc. This is currently adopted for:
- MediaWiki core rate limiter (PingLimiter, via WRStatsFactory and wgStatsCacheType)
- AbuseFilter profiling stats (via WRStatsFactory and wgStatsCacheType)
- CentralAuth tokens (via wgCentralAuthTokenCacheType)
Captcha data resides in MainStash DB today, and that's okay from a functional perspective and in terms of load/storage on MySQL. It works well enough in practice, but the mcrouter-primary-dc store may be a better fit for this type of highly ephemeral data nowadays.
Rationale
- Each of the three use cases currently has a novel configuration setting that sysadmins must discover and set "correctly". This contrary to the ethos of "good defaults" and means they are not part of standard explanations, diagrams and overviews such as at https://www.mediawiki.org/wiki/Object_cache. Recognising these three use cases as a genuine need will make these easier to understand and configure, resulting in a better default experience.
- Captcha data is sometimes read from the secondary DC, with an implicit assumption that data was written just a moment ago in the primary DC and already replicated, or vice versa. There is no guruantee in place that ensures this, making this effectively an undocumented dependency, contrary to the MainStash contract as documented at https://www.mediawiki.org/wiki/Object_cache#Main_stash which states that "reads can potentially be stale".
- Captcha data is highly ephemeral. The MainStash however is intended to provide "strong persistence", and produces a fair bit of churn on MainStash DB and its binlogs.
Proposal
Introduce a fourth cache interface in MediaWiki, to be documented on https://www.mediawiki.org/wiki/Object_cache and recognised through a services getter:
- Local server cache: php-apcu, local server. Optional, defaults to empty.
- WAN cache: memcached, local data center. Optional, defaults to empty. (Wraps internal "main cache".)
- (To be named): memcached, primary data center. Required, default to main cache, fallback to CACHE_DB.
- Main stash: mysql, local data center + replication. Required, default to CACHE_DB.
Work
- Introduce a new setting in wmf-config to control the new cache service interface.
- Add a new cache service interface in MediaWiki core, similar to LocalServerCache and MainStash. This will have name (TBD), a configuration variable, a documented set of contract expectations for developers, and must have a non-empty default that meets the contract (probably MainStash, since ClusterCache is not replicated).
- Document it on https://www.mediawiki.org/wiki/Object_cache
- Migrate ConfirmEdit tokens to the new service interface.
- Migrate CentralAuth tokens to the new service interface.
- Remove workarounds in favour of the new standard interface (wgStatsCacheType ✅, WRStatsFactory ✅, wgCentralAuthTokenCacheType ✅)