MediaWiki caching

From Wikitech
Wikimedia infrastructure

Data centres and PoPs

Networking

HTTP Caching


Wiki

Media

Logs

WMF uses squid to cache wiki content. We rely on the X-Vary-On header that MediaWiki code emits that tells the Squid reverse proxy about the HTTP headers on which it should split its cache of request responses.

We use varnish to cache bits.wikimedia.org content — CSS and JS resources, some images.

Cache headers

(See the HTTP specs for more formal wording)

Headers are sent mainly in OutputPage.php, function sendCacheControl(), around line 317. The headers sent depend mainly on the action (setSquidMaxage = $wgSquidMaxage in index.php for view and history) and if a cookie is sent by the browser.

Headers explained

Last-modified

This is required for client-side caching, as without it browsers don't know what to base their if-modified-since requests on. If the page hasn't changed the squid will only respond with a 304 (unchanged) status code, and only the response code and headers are transferred.

Cache-control

s-maxage

Tells intermediate caches such as squids how long they should consider the content to be valid without ever checking back. This needs to be hidden from caches we can't purge, otherwise users won't see changes. This is the reason for a header_access rule on the Squids which replaces any Cache-control header with one that only allows client caching:

Cache-Control: private, s-maxage=0, max-age=0, must-revalidate

max-age

How long clients (browsers) should deem the content to be up to date. We allow clients to keep the page (the 'private' allows this), but tell them to send a conditional if-modified-since request. For this of course the Last-modified header is needed, we set it to the last modification time or- if we don't have it- to the current time minus one hour. Images and stylesheets (including the generated ones that represent the user's pref selections) have max-age > 0 to avoid reloading those on each request. This is the reason why users have to refresh their cache after changing the prefs. (Is there a way to force a client to re-request something using javascript?)

private

Allows browsers to cache the content

Putting it together

Cache-Control: s-maxage=($wgSquidMaxage) , must-revalidate, max-age=0'

Allows caching on squids (s-maxage) which will replace it with

Cache-Control: private, s-maxage=0, max-age=0, must-revalidate

for all anon visitors without session which don't send a cookie. Second-tier squids are allowed to get the original headers with a special rule in squid.conf that matches their ips. After the first visit to an edit page or login the user sends a cookie and mw will also send no s-maxage to the squids so they don't cache it:

Cache-Control: private, must-revalidate, max-age=0

This again allows browsers to cache the page while forcing them to check for changes on each page view.

Vary

Tells downstream proxy caches to cache the content depending on some values — if those values are different, serve another page for the same url. For example, we use

Vary: Accept-Encoding, Cookie

to make sure logged-in users (which send a cookie) get pages with their user name and prefs (the cookie bit) and clients that don't support gzip transfer-encoding don't get compressed pages. I think there's some support for transparent decompression in Squid3, so it might not require to store different copies. See also: Vary in RFC 2616 and HTTP State Management Mechanism].

In addition, if $wgUseXVO is set (it is on all WMF wikis), OutputPage sends an additional

X-Vary-Options:

header that gives additional guidance to Squid caches (with Wikimedia patches).