What?
mcrouter is a our memcache router, responsible for the replication and sharding of our memcached data, as well as the reliability of the service. Mcrouter maintains a pool of connections to all servers in our memcache cluster, Currently in MW-on-K8s, mcrouter is one of the many containers in a mediawiki pod. As we have discussed in T277711, mcrouter could be a daemonset.
Why?
First of all, mcrouter is robust, and capable of serving *a lot* of traffic. A few reasons why we would be making better use of mcrouter if it were a demonset:
- Reduce the mediawiki pod by 2 containers (mcrouter, mcrouter+exporter)
- there are rarely updates in mcrouter's version and configuration (which requires no reboots), we will avoid starting up yet another container during mw deployments
- Fast fail, since each mcrouter daemonset will be receiving more traffic, it will failover faster to the gutter pull in case of a memcache server failure
- In baremetal we have 1 mcrouter per 96 php-fpm workers (avg memory ~135 MB) vs 1 mrouter per 8 php-fpm workers in mw-on-k8s (avg memory ~50 MB per pod)
- Fewer connections towards the memcache cluster. Each mcrouter maintains a connection pool with each memcached server
Drawbacks
- Unavailability of the daemonset, will result to either the whole node, and whole mw-* deployments to fail.
- Given mcrouter's history in terms of causing incidents, it is unlikely to happen due to the software itself
- Will need extra care when rolling out changes (rare, but still)
- Run into mcrouter scaling issues we have not seen so far
Future Work
- If mcrouter will be running as a daemonset, potentially any application using a memcache cluster (any), could use its corresponding service
How?
Roadmap:
- Create mcrouter chart and namespace
- Deploy mcrouter as a daemonset, available to each node
- Create a service where type: ClusterIP and internalTrafficPolicy: Local
- accessible only within kubernetes
- this way we will ensure that pods in a node will talk to the local daemonset via "mcrouter-service.mcrouter.svc.cluster.local"
- Make the $wgObjectCaches['mcrouter']['servers'] an environmental variable we can define in values.yaml T326705: Allow php-fpm to read environment variables from the system, not just from the fcgi request
- Will help with switching between the in-pod mcrouter and the mcrouter ds, thus testing
- update php7.4-fpm image to pass env['MCROUTER_SERVER'] in fpm pools
- update mediawiki chart with the relevant changes
- point mw-debug to the daemonset
- point codfw mw-* deployments
- point eqiad mw-* deployments
- remove mcrouter container from mw-*
- update Wikitech - mw-mcrouter and related pages
- update Grafana Dashboards