Dumps/Snapshot hosts

From Wikitech

Snapshot (XML dumps generation) cluster information

Hardware

These hosts generate the XML dumps. For information about the hosts that serve them, see Dumps/Dump servers.

We have one mini snapshot cluster. Eventually another will be set up in codfw.

In D.C.:

  • snapshot1001: primary, PowerEdge R815, Ubuntu 12.04.2, 64GB RAM, 4 8-core Opterons, 2 80GB HDs
  • snapshot1002: primary, PowerEdge R410, Ubuntu 12.04.2, 12GB RAM, 2 6-core Xeons, 500GB HD
  • snapshot1003: primary, PowerEdge R410, Ubuntu 12.04.2, 12GB RAM, 2 6-core Xeons, 500GB HD
  • snapshot1004: primary, PowerEdge R410, Ubuntu 12.04.2, 12GB RAM, 2 6-core Xeons, 500GB HD

The beefier server (with 4 8-core cpus) is a dedicated machine for the en wikipedia dumps.

Current setup

The dumps monitor which cleans up stale locks and generates the index.html file, can run on either snapshot1002 or snapshot1004.

Worker nodes:

En wikipedia dumps run on snapshot1001 via the dump scheduler.

All other dumps run on snapshot1002 and snapshot1004 via the dump scheduler.

For info on running the dumps, see Dumps#Starting_dump_runs.

Other tasks

All dump-related cron jobs run on snapshot 1003. From here central auth dumps are run, wikidata json dumps are generated, and so on.