Labs Eqiad Migration/Tools

From Wikitech
Jump to navigation Jump to search

dc-migrate

This tool is puppetized in files/openstack/havana/virtscripts/dc-migrate. It must be run as root on virt0.

   root@virt0:~# ./dc-migrate <instancename> <projectname>

If all goes well, at the end of the tool running there will be a new instance in eqiad with the same name. The instance in pmtpa is preserved in a shutdown state. The new instance will have puppet configured but will need a couple of runs to stabilize; after an hour everything should be peachy.

dc-migrate a bash script, and quite ugly. It provides some narrative about what it's doing, but it will also leak unimportant error messages. For example:

   ERROR: No server with a name or ID of 'testmigrate12' exists.
   (That error was a good thing.)

That's the script checking to make sure there isn't already an instance in eqiad by the same name. Don't worry!

   ERROR: Instance 0a299437-d834-4a4b-a4c9-c608096b5f77 in vm_state stopped. Cannot stop while the instance is in this state. (HTTP 400) (Request-ID: req-98b4736d-0d05-4339-9cf1-ecb3522c7ca4)

The script gives the pmtpa instance two changes to shut down, once via nova and once (gracelessly) via libvirt. That error just means that the nova shutdown succeeded. Don't worry!

   err: Could not call revoke: Could not find a serial number for i-0000007c.eqiad.wmflabs

That means that there wasn't already a puppet cert on virt0 for the new instance. Probably just as well. Don't worry!

inter-datacenter communication

Instances in eqiad can talk to instances in pmtpa, and vice versa. Note that if you don't use a fully-qualified name you will get the local version of the instance if it exists (and nothing, if it doesn't)

   andrew@testmigrate12:/data/project$ hostname -d # where are we?
   eqiad.wmflabs
   andrew@testmigrate12:~$ ping testmigrate12  # testmigrate12 in eqiad, if there is one
   PING testmigrate12.eqiad.wmflabs (10.68.16.4) 56(84) bytes of data.
   64 bytes from testmigrate12.eqiad.wmflabs (10.68.16.4): icmp_req=1 ttl=64 time=0.017 ms
   andrew@testmigrate12:~$ ping testmigrate12.pmtpa.wmflabs # testmigrate12 in pmtpa, if there is one
   PING testmigrate12.pmtpa.wmflabs (10.4.1.186) 56(84) bytes of data.
   64 bytes from 10.4.1.186: icmp_req=1 ttl=59 time=36.1 ms

wikitech region management

There are some region controls in /srv/org/wikimedia/controller/wikis/config/Local.php.

Right now wikitech is set so that only cloudadmins can view eqiad:

   $wgOpenStackManagerRestrictedRegions = array( 'eqiad' );

To give everyone access, just change that to:

   $wgOpenStackManagerRestrictedRegions = array();

Once eqiad is turned on, we will disable instance creation in pmtpa. To do that, modify the $wgOpenStackManagerReadOnlyRegions setting to make eqiad read-only:

   $wgOpenStackManagerReadOnlyRegions = array( 'eqiad' );

gluster file migration

I've started a big copy of gluster volumes to eqiad nfs. It's failing all over the place due to gluster, but with a little luck it will forge ahead and get at least some things copied over. The copies are stowed in <volname>/glustercopy.

There's a temporary keypair allowing ssh between labstore2 and labstore1001. In order to preserve this key, puppet is disabled on labstore1001.

The script is /root/andrew/volcopy.py on labstore2. It's running in a screen that I'm no longer attached to.

The logfile is /root/andrew/copyvols.log. A running tally of finished jobs is in /root/andrew/finishedvols.txt. The script checks finishedvols.txt as it works, so it's generally safe to stop and restart the script. The copying is pythonic but approximately this:

   tar --atime-preserve -czf - . | ssh labstore1001.eqiad.wmnet 'cd <dest>; tar xvzf -'

This is not throttled, but we're reading from a gluster volume which is super slow so I doubt this will cause any degree of network saturation. andrew (talk) 15:34, 28 February 2014 (UTC)[reply]