Dumps/Dump servers

From Wikitech

XML Dump servers

Hardware

We have two hosts:

  • Dataset1001 in D.C., production:
    Hardware/OS: PowerEdge R510, Ubuntu 14.04, 2 MD1200 arrays, 16GB RAM, 1 quad-core Xeon E5640 cpus
    Disks: 24 2TB disks in 2 12-disk raid6 volumes plus 12 3T disks also in a raid-6 volume; 120GB partition for the OS, 1GB for swap, the rest combined into one 57T LVM volume
    Note that this host also serves other public datasets such as some POTY files, the pagecount stats, etc.
  • Ms1001 in eqiad, spare:
    Hardware/OS: PowerEdge R510, Ubuntu 12.04, 64GB RAM, 2 8-core Xeon E5640 cpus
    Disks: 48 2TB disks in 4 12-disk raid6 volumes; 11GB partition for the OS, 1GB for swap, the rest combined into one 55T LVM volume

Services

The production host serves dump files and other public data sets to the public, using nginx. It also serves as an rsync server to our mirrors and to labs.

Deploying a new host

You'll need to set up the raid arrays by hand. We typically have two arrays so set up two raid 6 arrays with LVM to make one giant volume, xfs.

Install in the usual way (add to puppet, copying a pre-existing production dataset host stanza, set up everything for PXE boot and go). You may or may not want to include the download mirror classes from puppet for the new host. If you replace the host that is the current download mirror, make sure you tweak the cron job that generates the mirror file list, see Dumps/Snapshot hosts#Other_tasks for that and other jobs you might need to check.

Space issues

If we run low on space, we can keep fewer rounds of XML dumps; see Dumps#Space for how to do that.