DNS

From Wikitech

This page describes Wikimedia's DNS setup. Wikimedia use two separate kinds of DNS servers, authoritative nameservers (that respond to queries from third party nameservers for our domains) and resolvers (that resolve DNS queries for our own servers)

Need to make changes to Wikimedia zones? See HOWTO in this page's TOC.

Authoritative nameservers

In Wikimedia's DNS setup, Wikimedia have 3 authoritative DNS servers, all running gdnsd. The three authoritative servers are:

  • ns0.wikimedia.org - 208.80.154.238 (radon)
  • ns1.wikimedia.org - 208.80.153.231 (baham)
  • ns2.wikimedia.org - 91.198.174.239 (eeden)
WARNING, outdated info below where you see PowerDNS references; this document also still needs cleanup

The servers are running with gdnsd with the geip plugin which is responsible for geographic DNS.

Zonefiles and other configuration are replicated through the use of git fetch and git merge in a set of update scripts. In case of emergency the servers can be synced from any other as well.

All configuration files can be found in

/etc/gdnsd/

on all three hosts, with a separate conf file used just for syncing the zonefiles and such, in /etc/wikimedia-authdns.conf .

The main gdns configuration file is /etc/gdnsd/config. It is generated from files in our operations/dns git repo, as are the zone files.

Installation steps for a Wikimedia authoritative DNS server

This is now (mostly) managed by Puppet using the role::authdns classes and theauthdns module. You'll need to add the new nameserver to the list in role::authdns::base and set up a role::authdns::nsX class for it following the layout of the existing classes.

A puppet run will install gdns, the sync/update scripts, the config files, and the user and ssh keys for git fetch/merge between hosts.

Recursors

pdns_recursor is run on dobson and mchenry in tampa and nescio in esams. These rely on the dns::recursor class in dns.pp.

Domain templates

Because Wikimedia have a lot of zones that essentially contain the same records (aliases for wikipedia.org and other projects), the old DNS setup used a single zonefile for multiple zones. That has the advantage that just a single change in a zonefile affects many zones. Unfortunately, it doesn't permit the use of $ORIGIN lines in the zonefile. In the new DNS setup, each zone gets its own zonefile, but multiple zonefiles can be generated from a single zone template.

The zone templates are (regular) files in the operations/dns git repo in the templates directory.

Each regular file in this directory corresponds to a zone with the same name. Each symbolic link to a regular file in this directory corresponds to a domain alias. So, in this example:

# ls -l templates/mediawiki*
lrwxrwxrwx    1 root     root           13 Jun 19 15:52 templates/mediawiki.com -> mediawiki.org
lrwxrwxrwx    1 root     root           13 Jun 19 15:52 templates/mediawiki.net -> mediawiki.org
-rw-r--r--    1 root     root         1500 Jun 19 15:12 templates/mediawiki.org

...one zone mediawiki.org is listed, with two alias zones, mediawiki.com and mediawiki.net.

Variables and macros

Within the zone template, a few predefined variables and macros can be used, that will be substituted when the actual zonefiles are generated from the template. These include:

{{ zonename }}
The actual zone qname (FQDN) of the zonefile to be generated
{{ serial }}
The SOA serial number, derived from the current date and hour in YYYYMMDDHH format
{{ langlist(...) }}
A list of language subdomain CNAMEs, i.e. a list of all language abbreviations for all languages any Wikimedia project has, generated from helpers/langs.tmpl.
$TTL
this should be set to the desired time to live at the top of the zone file template.

The actual zonefiles are generated from the zone templates by a Python script, authdns-gen-zones.py from the authdns puppet module. Relying on the templating system jinja2, it reads all zone templates from the template directory, applies string and macro substitutions, and writes the result to the

/etc/gdnsd/zones

directory, where gdns can read them as regular zonefiles.

Because a templating system is used, you can also add entries in the zone file templates like these:

{% for i in range(1, 10) %}
asw{{ i }}-pmtpa	1H	IN A		10.0.1.{{ i }}
{%- endfor %}

Note that if the range is (a,b) then the first entry will be for a but the last entry will be for b-1.

authdns-git-pull

/usr/sbin/authdns-git-pull updates the git repo on the local dns server from ... (finish this description someday)

authdns-update

/usr/sbin/authdns-update is a simple shell script, that automates the invocations of the scripts above. It goes through the following steps on each of ns0-2 (needs updating, too lazy to fix completely right now):

  1. ssh to the host
  2. git pull the templates from operations/git repo via authdns-git-pull
  3. generate the zone files from the zone template files via authdns-gen-zones
  4. update the gdnsd config files from the local git repo just updated
  5. sanity checks and reload of the gdns daemon

Basically, authdns-update takes care of everything after you've edited and merged the zonefiles.

authdns-local-update

/usr/sbin/authdns-local-update is used on any of the servers for pulling in updates from any other (presumably up to date) dns server. It can be used to bring a server back up to date after e.g. downtime or a software install/update. It is also used in this way by puppet during initial setup.

Geographic DNS

Geographic DNS makes sure that clients end up using the Wikimedia cluster closest to them, by varying DNS responses based on the (country of the) resolver IP querying. This is handled by the gdns geoip plugin. The config file is in config-geo in the operations/dns repo. Our geoip setup makes use of /usr/share/GeoIP/GeoIPCity.dat (ipv4) and /usr/share/GeoIP/GeoIPv6.dat (ipv6). These are pulled from the volatile directory on the puppet master which is updated regularly by cron. See the geoip module for more information.

HOWTO

This section briefly explains how to do the most common DNS changes.

Change GeoDNS

For example, when a certain cluster is down/unreachable, and you want to move all traffic to the others.

Edit the config-geo file in the operations/dns repo, commit, and run authdns-update from any of the dns servers.

Changing records in a zonefile

  • This is handled via the git repo operations/dns
  • Edit the template file templates/zonename locally and check into git, and git review (for gerrit review)
  • Merge your change in gerrit, then login to ns0.wikimedia.org, and run authdns-update. This will pull from operations/dns and generate zonefiles and gnsd configs on each nameserver.
  • This no longer requires you to forward your own key, the systems are set up with their own trusted keys for the sync.
  • Once the script completes, its a good habit to query all three DNS servers to make sure your change has been correctly deployed
  • for example: for i in 0 1 2 ; do dig @ns${i}.wikimedia.org -t any my-changed-record.wikimedia.org ; done
  • If any auth DNS server failed to response, restart it with /etc/init.d/gdnsd restart (though this shouldn't happen anymore as before with pdns)

Adding a new zone

  1. First, decide if this new zone will use a new, independent zonefile, or will be an alias of another zone
    independent zonefile
    Create the new zone template in the operations/dns repository as templates/zonename. (Copy an existing, relatively clean zonefile like wiktionary.org to start with).
    zone alias
    Make a symbolic link templates/aliasname for the alias to the zone being aliased.
  2. git add the file in, commit, and review on gerrit.
  3. Run authdns-update on ns0 (or any nameserver).
  4. Query all three ns servers to verify that your change took correctly.
  5. If any auth DNS server failed to response, restart it with /etc/init.d/pdns restart

Removing a zone

  • git rm the appropriate file, and merge on gerrit.
  • Log into NS0 and run authdns-update

Adding a new (language) wiki

Doublecheck this please

  1. Add the language code to templates/helpers/langs.tmpl in the operations/dns repo
  2. Run authdns-update
  3. query all three ns servers to verify that your change took correctly.
  4. If any auth DNS server failed to response, restart it with /etc/init.d/gdns restart (should not happen)
  • !there is a known issue here. if all you do is edit langs.tmpl but don't touch an actual zone, then zone files will not be regenerated. work around by touching the zone template in some way. open ticket RT #8778)

If a certain nameserver is unreachable

When a certain nameserver is unreachable, the others can still be updated from any of the other servers, by running authdns-update there. To skip the unreachable server in the update process, use:

# authdns-update -s "server list"

where server list is a space separated list of FQDNs. Do not forget the quotes, the script will only accept one argument behind -s.

  1. query all three ns servers to verify that your change took correctly.
  2. If any auth DNS server failed to response, restart it with /etc/init.d/pdns restart

Resolvers

Each cluster has its own set of recursive resolvers:

eqiad
chromium (ip?), hydrogen (ip?)
esams
maerlant (ip?), nescio (ip?)
codfw
acamar (ip?), achernar (ip?)

Each resolver runs the PowerDNS recursor, using package pdns-recursor in the Wikimedia APT repository (universe). The configuration file is:

/etc/powerdns/recursor.conf

Some runtime control is available through rec_control, see http://docs.powerdns.com/rec-control.html

With puppet, setting variable $dns_recursor_ipaddress to the recursor service ip, and including class dns::recursor suffices to make a server into a PowerDNS recursor.

The following settings have been modified from the default:

allow-from

Lists the IP ranges that are allowed to query this recursor. 127/8 and internal and external Wikimedia IP ranges are listed.

forward-zones

Forwards queries for the internal zones to the authoritative nameserver(s):

forward-zones= wmnet=208.80.152.130;208.80.152.174;91.198.174.4, 10.in-addr.arpa=208.80.152.130;208.80.152.174;91.198.174.4

local-address

Comma separated list of IPs on which the recursor should listen for queries. List the (external) service IP, e.g. 208.80.152.131.

setgid, setuid

Change uid/gid to pdns. Unfortunately this account is not created by the Debian package, so use:

# adduser --system --no-create-home --group --disabled-password pdns

Statistics

This is handled by the puppet class dns::recursor::statistics

To setup statistics of the recursor, use the following steps:

  1. install rrdtool
  2. Copy the directory /usr/local/powerdnsstats off one of the other recursors (dobson, mchenry)
  3. Install lighttpd or apache if not already present
  4. mkdir /var/www/pdns as root
  5. Run cd /var/www/pdns && /usr/local/powerdnsstats/create && wget http://dobson.wikimedia.org/pdns/index.html as root
  6. Set up the following cron job, in /etc/cron.d/pdns-recursor:
*/5 * * * *     root    cd /var/www/pdns/ && /usr/local/powerdnsstats/update && /usr/local/powerdnsstats/makegraphs >/dev/null

IPv6

Besides looking at the IPv6 page, note that requests for upload.knams.wikimedia.org are handled by /usr/local/lib/selective-answer.py on the dns hosts (see trunk/tools/selective-answer/), and that at the moment you have to be listed in /etc/powerdns/participants in order to get the AAAA record back.

HOWTO

Remove a record from the DNS resolver caches

If you have just added or updated a DNS record on the authoritative nameservers, it may still be cached on the (unrelated) DNS resolvers used by our servers. To clear a record from the cache, use:

# rec_control wipe-cache record-name

on all the DNS resolvers. This will also clear any negative cache records. If you need to clear a PTR record, be sure to use the actual record name, e.g.

 # rec_control wipe-cache 122.36.64.10.in-addr.arpa.

(with the trailing '.').

External links