Multicast HTCP purging

From Wikitech
Wikimedia infrastructure

Data centres and PoPs

Networking

HTTP Caching


Wiki

Media

Logs

This page was heavily edited by RobLa on 2013-01-28, and could use a review by a knowledgable opsen

Multicast HTCP purging is a method of purging Varnish cache objects using multicast HTCP packets.

Request flow

  • MediaWiki instance in Eqiad detects that a purge is needed. It sends an HTCP purge request to a multicast group for each individual URI that needs to be purged.
  • Native multicast routing is enabled in Eqiad and Pmtpa, and multicast packets should natively route between the two datacenters
  • Multicast is sent to Esams via multicast->unicast->multicast relay located in eqiad (as of 2013-11-04)
  • All Squids/Varnish caches subscribe to the multicast feed

Note, multicast HTCP is a one-way protocol, which means that requests are fired and forgotten. If there is a problem anywhere in the system, the HTCP origin has no way of knowing there was a failure, and thus assumes that the request went through.

HTCP modifications to Squid

Mark Bergsma modified the HTCP support in Squid to do the following:

  • work without requiring HTCP CLR responses
  • work at all when not requesting HTCP CLR responses
  • use a different store searching algorithm instead of htcpCheckHit(), which was intended for finding cache entries for URI hits instead of URI purges
  • allow the simultaneous removal of both HEAD and GET entries with a single HTCP request, by specifying NONE as the HTTP method

The Squids are all configured with the following line:

mcast_groups 239.128.0.112

to have them join the relevant multicast group, and receive all the purge requests.

Varnish

Varnish relies on a separate listener daemon (varnishhtcpd) to listen for purge requests and respond to them.

MediaWiki

MediaWiki was extended with a SquidPurge::HTCPPurge method, that takes a HTCP multicast group address, a HTCP port number, and a multicast TTL (see DefaultSettings.php to send all URLs to purge to. It can't make use of persistent sockets, but the overhead of setting up a UDP socket is minimal. It also doesn't have to worry about handling responses.

All Apaches are configured through CommonSettings.php to send HTCP purge requests to the multicast group address 239.128.0.112. It uses multicast Time To Live 2 (instead of the default, 1) because the messages need to cross a single subnet/router.

udpmcast relay

udpmcast is a small application level multicast tool in Python, . It joins a given multicast group on startup, listens on a specified UDP port, and then forwards all received packets to a given set of (unicast or multicast) destinations.

Its options can be found by running it with the -h argument.

As of November 2013, chromium is running udpmcast via /etc/rc.local and sending to dobson. Group is 239.128.0.112 port 4827.

udpmcast.py supports forwarding rules, where it selects the destination address list based on the source address that sent the packet. These forward rules can be specified as a Python dictionary on the command line.

Multicast breakage troubleshooting

(current as of November) 2013

this is for troubleshooting the udp multicast to unicast proxy that enables purges to work in pmtpa

first tcpdump on chromium

tcpdump -n -v udp port 4827 and host 239.128.0.112

Is there a crazy amount of traffic? if yes, it's not the network on the eqiad side! if no, it's the network on the pmtpa side.

if there is a lot of traffic then tcpdump on hooft

tcpdump -n -v udp port 4827 and host 208.80.152.173

Do you see a huge amount of traffic? If yes - it's not the network! Let's say that chromium has no traffic.

After that making sure it is listening -

root@chromium:/var/log# netstat -nl | grep 4827
udp        0      0 0.0.0.0:4827            0.0.0.0:*

Then let's check and see if chromium can get multicast traffic on the correct group. Start iperf on chromium.

iperf -s -B 239.128.0.112 -u -p 1337 -i 5

Then go to a varnish machine (like cp1041) and start up iperf

iperf -c 239.128.0.112 -b 50K -t 300 -T 5 -u -p 1337 -i 5

Notice the port is NOT one used by a real service. This is important.

You should see output on chromium like

root@cp1044:~# iperf -s -B 239.128.0.112 -u -p 8648 -i 5
------------------------------------------------------------
Server listening on UDP port 1337
Binding to local address 239.128.0.112
Joining multicast group  239.128.0.112
Receiving 1470 byte datagrams
UDP buffer size:   122 KByte (default)
------------------------------------------------------------
[  3] local 239.128.0.112 port 1337 connected with 10.64.0.169 port 8442
[ ID] Interval       Transfer     Bandwidth       Jitter   Lost/Total Datagrams
[  3]  0.0- 5.0 sec  30.1 KBytes  49.4 Kbits/sec  0.038 ms    0/   21 (0%)
[  3]  5.0-10.0 sec  30.1 KBytes  49.4 Kbits/sec  0.025 ms    0/   21 (0%)
[  3] 10.0-15.0 sec  30.1 KBytes  49.4 Kbits/sec  0.023 ms    0/   21 (0%)


If you do not, multicast has gone wrong.

Try this step over again but change the group address (like to 239.128.0.115). If this still does not work, multicast is broken between datacenters.

Purge a URL

on terbium, run:

$ echo 'https://example.org/foo?x=y' | mwscript purgeList.php

History

Previous methods of Squid purging implemented in MediaWiki, SquidUpdate::purge and SquidUpdate::fastPurge, used HTTP PURGE requests over unicast TCP connections from all Apaches to all Squids. This had a few drawbacks:

  • All Apaches needed to be able to connect to all Squids
  • There was overhead of handling Squid's replies and TCP connection overhead

The biggest drawback was that it was plain slow. Some profiling runs show that the current method is about 8000 times faster than the older fastPurge method.

External links