Page MenuHomePhabricator

Reduce static asset time on disk from five trains' worth to two
Closed, ResolvedPublic

Description

Currently, we're still in the habit of keeping the last 5 revisions on disk (https://wikitech.wikimedia.org/wiki/Heterogeneous_deployment/Train_deploys#Remove_clones_of_expired_branches).

This may be unnecessary given that we've moved towards using static.php where the max-age for assets is 24 hours:

(/^ヮ^)/*:・゚✧ curl -sI 'https://en.wikipedia.org/w/extensions/Math/images/button_math.png' | grep -i cache-control
cache-control:public, s-maxage=300, must-revalidate, max-age=86400

Would be nice to free up some disk-space/not worry about deleting assets in use.

Related Objects

StatusSubtypeAssignedTask
ResolvedLegoktm
Resolved GWicke
Resolved mmodell
OpenNone
Resolved demon
Declined mmodell
Declined mmodell
Resolveddduvall
ResolvedKrinkle
ResolvedKrinkle
Resolved mmodell
DuplicateKrinkle
ResolvedKrinkle
ResolvedKrinkle
ResolvedPRODUCTION ERRORMaxSem
ResolvedKrinkle
ResolvedKrinkle
ResolvedKrinkle
ResolvedKrinkle
ResolvedKrinkle

Event Timeline

This may be unnecessary given that we've moved towards using static.php where the max-age for assets is 24 hours:

curl -sI 'https://en.wikipedia.org/w/extensions/Math/images/button_math.png' | grep -i cache-control
cache-control:public, s-maxage=300, must-revalidate, max-age=86400

This is incorrect. max-age falls back to 24 hours for canonical/unversioned urls. By far the most common hits to static urls is versioned, such as: https://en.wikipedia.org/w/skins/Vector/images/user-icon.png?13155

$ curl -sI 'https://en.wikipedia.org/w/skins/Vector/images/user-icon.png?13155' | grep -i cache-control
Cache-control: public, s-maxage=31536000, max-age=31536000

It's cached upto 1 year. Anyway, don't worry - the cache life time of the resource is not relevant for deciding how long to keep branches. (If anything, it's the opposite - the shorter the cache lives, the more important it is that the resource can be regenerated from disk).

Just as before (T46570), the relevant metric is how long the stuff that contains these urls is cached, not the stuff behind these urls.

Content generated by MediaWiki skins or OutputPage is cached for 14 days in Varnish backends and 1 day in Varnish frontends (used to be 31 days; see T124954).

Content generated by core and extensions in ParserOutput (e.g. any image references in side page content, such as wikihiero extension) is cached for upto 30 days still.

As such, branches must be kept for 30 days after the last wiki stops using it.

I discussed this at some length with @demon and @thcipriani. These are the two choices we came up with, either one should work. #1 would save disk space, however, #2 might be more robust and easier to implement.

  1. Delay removal of static resources
    • When removing a static resource during the weekly branch merge, add the file to a fifo list that static.php will read and recognize
    • only after a file has been retained for 30 days we can actually delete the file and remove it from the list
  2. Snapshot the deployed branch before merging into it
    • Retain the entire branch for 30 days under a static name
    • This would also facilitate quick rollbacks

I'm thinking that option (1) will be the best to be honest. The FIFO could be managed by .gitignore--automatically during RelEng's weekly branch work. Images would Just Work in such a scenario, whereas CSS/JS would need some modification I believe since the files would no longer be registered with RL?

@Krinkle I'm curious what thoughts you have here.

Addendum: I read static.php and the related Apache config a bit further, and it seems like it will Mostly Work The Way I Want as it is right now. The main difference in the context of long lived branches will be some logic adjustment in static.php to check in different/fewer locations than we did before.

Is this task resolved? It looks like the root concerns here may be addressed:

  • We no longer expose branch names in urls. Instead, urls are based only on the original file paths (under $IP) and the hash of their contents. – T99096
  • All direct and indirect caching of resources (or urls to resources) within our control have a known and fixed limit. With the exception of APC (per-server PHP key-value cache) and localStorage (client-side module cache) which mostly don't have expiry and are based solely on content hashes. This is fine because those same two mechanisms also revalidated at run-time against the current registry (which is not stored there). So while a client can indeed have an unchanged/unreplaced module blob of several months old, that's only the case if the content and relative url to that content hasn't changed either. This was previously a problem because we would serve those relative urls from a base directory named after a temporary deployment branch name (which would cease to exist after a while).
  • Varnish caching is now limited to 7 days. T124954

With this I believe we can safely remove branches 8 days after they were last used for a wiki. And quite possibly are already doing so.

With this I believe we can safely remove branches 8 days after they were last used for a wiki.

Hurrah.

And quite possibly are already doing so.

We still have the last five branches in production. We can "fix" that by changing the instructions at https://wikitech.wikimedia.org/wiki/Heterogeneous_deployment/Train_deploys#Clean_up_old_stuff and alerting train deployers to the changed expectations.

Jdforrester-WMF renamed this task from Static asset time on disk to Reduce static asset time on disk from five trains' worth to two.Oct 1 2019, 3:48 PM

Clearer title.