Swift/How To

From Wikitech
This project is active.
Swift
Due: 2012-04-01

General Prep

Nearly all of these commands are best executed from a swift proxy host (eg ms-fe1.pmtpa.wmnet) and require either the master password or an account password. Both the master password (super_admin_key) and the specific users' passwords we have at Wikimedia are accessible in the swift proxy config file /etc/swift/proxy-server.conf or in the private puppet repository.

All of these tasks are explained in more detail and with far more context in the official swift documentation. This page is intended as a greatly restricted version of that information directed specifically at tasks we'll need to to at WMF. For this reason I leave out many options and caveats and assume things like the authentication type (swauth) used to restrict it to what's correct for our installation. It may or may not be useful for a wider audience.

Set up an entire swift cluster

This is documented elsewhere: Swift/Setup_New_Swift_Cluster

Indivdual Commands - interacting with Swift

Create a user / account

This assumes swauth is prepped (swauth-prep)

  • generate a password: pass=$(pwgen -s 12 1)
  • add the user: swauth-add-user -A http://127.0.0.1/auth/ -K $super_admin_key -a myaccount newuser password
    • (swift has multiple accounts, each account has users, each user has a password)
    • note - swift's user's passwords are visible (plaintext) to anybody with the $super_admin_key
  • test it and retrieve the account id:
    • swauth-list -A http://127.0.0.1/auth/ -K $super_admin_key myaccount
      • you're looking for newuser's "account_id": "AUTH_205b4c23-6716-4a3b-91b2-5da36ce1d120"

Remove a user / account

Show account information

This assumes swauth is prepped (swauth-prep)

The same command will show all accounts, all users within an account, or information specific to an individual user within an account, depending on the arguments passed

Troubleshooting - if you leave off the -A you will likely get socket.error: [Errno 111] ECONNREFUSED

Get the AUTH account string

  • using the swauth tool and the swauth super admin key:
    • where $user is your username and $proxy is a proxy node and $key is the swauth super_admin_key:
    • swauth-list -A http://$proxy/auth -K $key $user
    • the account AUTH string is labeled account_id and looks like AUTH_01234567-89ab-cdef-0123-456789abcdef (AUTH_8-4-4-4-12)
  • using curl and an account / password pair

Get an authenticated session token

The session token is temporary (default of 24 hours, can be changed by the config option token_life, see [1]) and should be refetched if you are using one and get a 401 permission denied. The token is returned in the header of a GET request sent with appropriate authentication headers. The token is returned in two headers, the X-Storage-Token and X-Auth-Token. I think the X-Storage-Token is deprecated.

root@copper:/etc/swift# curl -k -v -H 'X-Auth-User: test:tester' -H 'X-Auth-Key: testing' http://127.0.0.1:8080/auth/v1.0
* About to connect() to 127.0.0.1 port 8080 (#0)
*   Trying 127.0.0.1... connected
* Connected to 127.0.0.1 (127.0.0.1) port 8080 (#0)
> GET /auth/v1.0 HTTP/1.1
> User-Agent: curl/7.19.7 (x86_64-pc-linux-gnu) libcurl/7.19.7 OpenSSL/0.9.8k zlib/1.2.3.3 libidn/1.15
> Host: 127.0.0.1:8080
> Accept: */*
> X-Auth-User: test:tester
> X-Auth-Key: testing
> 
< HTTP/1.1 200 OK
< X-Storage-Url: http://msfe-test.wikimedia.org:8080/v1/AUTH_854f8c66-63b6-4965-8b6c-5b2ccfe79aa8
< X-Storage-Token: AUTH_tk371f407774ef4a6580cb1c684308fb53
< X-Auth-Token: AUTH_tk371f407774ef4a6580cb1c684308fb53
< Content-Length: 126
< Date: Fri, 23 Mar 2012 23:23:54 GMT
< 
* Connection #0 to host 127.0.0.1 left intact
* Closing connection #0
{"storage": {"default": "local", "local": "http://msfe-test.wikimedia.org:8080/v1/AUTH_854f8c66-63b6-4965-8b6c-5b2ccfe79aa8"}}root@copper:/etc/swift# 

Create a container

You create a container by POSTing to it. You modify a container by POSTing to an existing container. Only users with admin rights (aka users in the .admin group) are allowed to create or modify containers.

Run the following commands on any host with the swift binaries installed (any host in the swift cluster or iron)

List containers and contents

It's easiest to do all listing from a frontend host on the cluster you wish to list. You will need the account password to do any listing.

  • log into a swift frontend host on the cluster you want to use (eg ms-fe1.pmtpa.wmnet for the production pmtpa cluster)
  • read the account name and key (account password) from the file /a/common/wmf-config/PrivateSettings.php on tin, looking for the $wmfSwiftConfig stanza and checking the values for 'user' and 'key'. In the examples below, substitute the key for the variable '$pass'.

list of all containers

list the contents of one container

  • ask for a listing of the container: swift -A http://127.0.0.1/auth/v1.0 -U mw:thumb -K $pass list wikipedia-commons-local-thumb.a2

list specific objects within a container

example: look for all thumbnails for the file Little_kitten_.jpg

  • start from a URL for a thumbnail (if you are at the original File: page, 'view image' on the existing thumbnail)
  • Pull out the project, "language", thumb, and shard to form the correct container and add -local into the middle
    • eg wikipedia-commons-local-thumb.a2
    • Note - only some containers are sharded: grep shard /etc/swift/proxy-server.conf to find out if your container should be sharded
    • unsharded containers leave off the shard eg wikipedia-commons-local-thumb
  • ask swift for a listing of the correct container with the --prefix option (it must come before the container name)
    • swift -A http://127.0.0.1/auth/v1.0 -U mw:thumb -K $pass list --prefix a/a2/Little_kit wikipedia-commons-local-thumb.a2
    • note that --prefix is a substring anchored to the beginning of the shard; it doesn't have to be a complete name.

Show specific info about a container or object

Note - these instructions will only show containers or objects the account has permission to see.

  • log into a swift frontend host on the cluster you want to use (eg ms-fe1.pmtpa.wmnet for the production pmtpa cluster)
  • read the key (account password) from config: pass=$(grep "^key" /etc/swift/proxy-server.conf | cut -f 3 -d\ )
  • ask for statistics about all containers: swift -A http://127.0.0.1/auth/v1.0 -U mw:thumb -K $pass stat
  • ask for statistics about the container: swift -A http://127.0.0.1/auth/v1.0 -U mw:thumb -K $pass stat wikipedia-commons-local-thumb.a2
  • ask for statistics about an object in a container: swift -A http://127.0.0.1/auth/v1.0 -U mw:thumb -K $pass stat wikipedia-commons-local-thumb.a2 a/a2/Little_kitten_.jpg/300px-Little_kitten_.jpg

Delete a container or object

Note - THIS IS DANGEROUS; it's easy to delete a container instead of an object by hitting return at the wrong time!

Deleting uses the same syntax as 'stat'. I recommend running stat on an object to get the command right then do cli substitution (^stat^delete^ in bash)

  • log into a swift frontend host on the cluster you want to use (eg ms-fe1.pmtpa.wmnet for the production pmtpa cluster)
  • read the key (account password) from config: pass=$(grep "^super" /etc/swift/proxy-server.conf | cut -f 3 -d\ )
  • run swift stat on the object you want to delete
    • ben@ms-fe1:~$ swift -A http://127.0.0.1/auth/v1.0 -U mw:thumb -K $pass stat wikipedia-de-local-thumb f/f5/Wasserhose_1884.jpg/800px-Wasserhose_1884.jpg
  • swap stat for delete in the same command.
    • ben@ms-fe1:~$ ^stat^delete^

When you call delete for a container it will first delete all objects within the container and then delete the container itself.

Setup temp url key on an account

MediaWiki makes use of a temporary url key to download files, the key must be set on the mw:media account. On swift machines that report statistics you can find several .env files to "su" to each account, e.g.

 source /etc/swift/account_AUTH_mw.env
 swift post -m 'Temp-URL-Key:<your key>'

Individual Commands - Managing Swift

Show current swift ring layout

There are three rings in Swift: account, object, and container. The swift-ring-builder command with a builder file will list the current state of the ring.

  1. swift-ring-builder /etc/swift/account.builder
  2. swift-ring-builder /etc/swift/container.builder
  3. swift-ring-builder /etc/swift/object.builder

Rebalance the rings

You only have to rebalance the rings after you have made a change to them. If there are no changes pending, the attempt to rebalance will fail with the error message "Cowardly refusing to save rebalance as it did not change at least 1%."

To rebalance the rings you run the actual rebalance on a copy of the ring files then distribute the rings to the rest of the cluster (via puppet).

The canonical copy of the rings is kept in operations/software/swift-ring.git with instructions on how to make changes and send them for review. After a change has been reviewed and merged it can be deployed (i.e. pushed to the puppet master)

Add a proxy node to the cluster

  • Update site.pp in puppet to make the new proxy match existing proxies in that cluster
    • likely you'll include role::swift::xxx-yyy::proxy
    • maybe some ganglia-related stuff
  • Update the xxx-yyy config section in role/swift.pp
    • add the new server to the list of memcached_servers
  • Run puppet on the host twice, reboot, and run puppet again
  • Test the host
  • Add the new proxy to the load balancer (full details) if it's a load balanced cluster

Remove a failed proxy node from the cluster

  • Take the failed node out of the load balancer if necessary
  • Update the puppet configuration for the cluster
    • remove the failed node from the memcached list in the role/swift.pp in the cluster config

Add a storage node to the cluster

Start by doing the normal setup with a few tweaks, paying attention to the desired swift server layout.

Puppet will take care of all disks that are only 1 partition used for data - you should pass it all non-OS disks. You may have to create partitions on the OS disk for swift storage. The following is what I ran on ms-be1 (where the bios is on sda1 and sdb1, the OS partition is raided across 120GB partitions on sda2 and sdb2, and sda3 and sdb3 are swap):

 # parted
 ) help
 ) print free
 ) mkpart swift-sda4 121GB 2000GB
 ) select /dev/sdb
 ) print free
 ) mkpart swift-sdb4 121GB 2000GB
 ) quit
 # mkfs -t xfs -i 512 -L swift-sda4 /dev/sda4
 # mkfs -t xfs -i 512 -L swift-sdb4 /dev/sdb4
 # mkdir /srv/swift-storage/sd{a,b}4
 # chown -R swift:swift /srv/swift-storage/sd{a,b}4
 # vi /etc/fstab # <-- add in a line for sda4 and sdb4 with the same xfs options as the rest
 # mount -a
 # reboot # just for good measure

After Puppet has finished setting up Swift and all device partitions are mounted successfully, add them to the rings. (Since the two partitions on sda and sdb are slightly smaller than the rest, they should get an appropriately smaller weight, eg 95 instead of 100.)

Add a device (drive) to a ring

Select the following values:

  • zone : each rack is its own zone; all servers within a rack and all drives within a server should be the same zone
    • list all the drives to see what zones are in use with swift-ring-builder /etc/swift/account.builder (see above)
  • ip - ip of the storage node
  • dev - the short name of the partition - eg 'sdc1'
  • weight - how big the partition is in gigabyte (powers of 10, not 2) (e.g. 2TB -> 2000)

note see #Rebalance_the_rings on how to obtain a copy of the rings

    swift-ring-builder account.builder add z${zone}-${ip}:6002/${dev} $weight
    swift-ring-builder container.builder add z${zone}-${ip}:6001/${dev} $weight
    swift-ring-builder object.builder add z${zone}-${ip}:6000/${dev} $weight

Example, to add device /dev/sda4 on ms-be5:

   swift-ring-builder account.builder add z5-10.0.6.204:6002/sda4 100
   swift-ring-builder container.builder add z5-10.0.6.204:6001/sda4 100
   swift-ring-builder object.builder add z5-10.0.6.204:6000/sda4 100

After you're done, you must rebalance the three rings and push them out to the rest of the cluster.

Remove a failed storage node from the cluster

Remove each of the devices on the failed node from the rings, rebalance, and distribute the new ring files.

Remove (fail out) a drive from a ring

There are two conditions in which you will want to remove a device from service

  • when the device is dead or the host is down and unreachable
  • when it's still working but you want to decommission it or pull it out for service

For the former, you just remove the device; for the latter, you can nicely pull data off the device before shutting it off by changing the device weight first.

remove failed devices

The command to remove a device is swift-ring-builder /etc/swift/<ring>.builder remove d###. Here's the sequence:

  • find the IDs of the devices you want to remove. You're looking for the 'id' using the IP address and name as your keys. You should verify that the ID is the same across all three rings; I'm only showing one ring here for the example.
root@ms-fe2:~# swift-ring-builder /etc/swift/account.builder
/etc/swift/account.builder, build version 192
65536 partitions, 3 replicas, 5 zones, 161 devices, 0.10 balance
The minimum number of hours before a partition can be reassigned is 3
Devices:    id  zone      ip address  port      name weight partitions balance meta
             0     1      10.0.0.250  6002      sda1  25.00        844    0.02
             1     1      10.0.0.250  6002     sdaa1  25.00        844    0.02
             2     1      10.0.0.250  6002     sdab1  25.00        844    0.02
             3     1      10.0.0.250  6002     sdad1  25.00        844    0.02
             4     1      10.0.0.250  6002     sdae1  25.00        844    0.02
             5     1      10.0.0.250  6002     sdaf1  25.00        844    0.02
             etc. etc. etc.
  • remove them (in this example I'm removing an entire host; you can remove only a single drive if necessary.) Note that in our environment, account and container device IDs often (but not always) match and object device IDs are different. You should check each ring individually.
cp -a /etc/swift ~; cd ~/swift;
for i in {150..161}; do
  swift-ring-builder account.builder remove d$i
done                                                                                            

remove working devices for maintenance

To remove a device for maintenance, you set the weight on the device to 0, rebalance, wait a while (a day or two), then do your maintenance. The examples here assume you're removing all the devices on a node. Note that I'm only checking one of the three rings but taking action on all three. To be completely sure we should check all three rings but by policy we keep all three rings the same.

  • find the IDs for the devices you want to remove (in this example, I'm pulling out ms-be5)
 root@ms-fe1:/etc/swift# swift-ring-builder /etc/swift/account.builder search 10.0.6.204
 Devices:    id  zone      ip address  port      name weight partitions balance meta
            186     8      10.0.6.204  6002      sda4  95.00       1993  -12.24
            187     8      10.0.6.204  6002      sdb4  95.00       1993  -12.24
            188     8      10.0.6.204  6002      sdc1 100.00       2098  -12.23
            189     8      10.0.6.204  6002      sdd1 100.00       2097  -12.27
            190     8      10.0.6.204  6002      sde1 100.00       2097  -12.27
            191     8      10.0.6.204  6002      sdf1 100.00       2097  -12.27
            192     8      10.0.6.204  6002      sdg1 100.00       2097  -12.27
            193     8      10.0.6.204  6002      sdh1 100.00       2097  -12.27
            194     8      10.0.6.204  6002      sdi1 100.00       2097  -12.27
            195     8      10.0.6.204  6002      sdj1 100.00       2097  -12.27
            196     8      10.0.6.204  6002      sdk1 100.00       2097  -12.27
            197     8      10.0.6.204  6002      sdl1 100.00       2097  -12.27
  • set their weight to 0

cd [your swift-ring.git checkout]/[swift instance] (e.g. eqiad-prod)

 for id in {186..197}; do 
   for ring in account object container ; do 
     swift-ring-builder ${ring}.builder set_weight d${id} 0
   done
 done
Alternatively you can, for a given ring,
 swift-ring-builder ${ring}.builder set_weight 10.0.6.204 0
It will prompt you with a list of the devices that will be affected and give you a change to confirm or cancel.
  • check what you've done
 git diff -w

Replacing a disk without touching the rings

If the time span for replacement is short enough the failed disk can be left unmounted and swapped with a working one. After successful replacement it should be added back to the raid controller and the raid cache discarded:

 megacli -GetPreservedCacheList -a0
 megacli -DiscardPreservedCache -L'disk_number' -a0
 megacli -CfgEachDskRaid0 WB RA Direct CachedBadBBU -a0

Change info for devices in a ring

  • first use the search subcommand to find the devices you want to update. Example, looking for all the devices on 10.0.6.205 with port 6002, which is wrong:
 root@ms-be11:~/swift-rings/swift# swift-ring-builder container.builder search z8-10.0.6.205:6002
 Devices:    id  zone      ip address  port      name weight partitions balance meta
            202     8      10.0.6.205  6002      sdn3 100.00      13108  -33.33 
            203     8      10.0.6.205  6002      sdm3 100.00      13108  -33.33 
 root@ms-be11:~/swift-rings/swift#
  • next, if it showed you the right devices, use the set_info subcommand to replace the incorrect info. In this example, the port number is wrong, so we update it as follows:
 root@ms-be11:~/swift-rings/swift# swift-ring-builder container.builder set_info z8-10.0.6.205:6002 10.0.6.205:6001
 Matched more than one device:
     d202z8-10.0.6.205:6002/sdn3_""
     d203z8-10.0.6.205:6002/sdm3_""
 Are you sure you want to update the info for these 2 devices? (y/N) y
 Device d202z8-10.0.6.205:6002/sdn3_"" is now d202z8-10.0.6.205:6001/sdn3_""
 Device d203z8-10.0.6.205:6002/sdm3_"" is now d203z8-10.0.6.205:6001/sdm3_""
 root@ms-be11:~/swift-rings/swift# 
  • check your work:
 root@ms-be11:~/swift-rings/swift# swift-ring-builder container.builder 
 container.builder, build version 809
 65536 partitions, 3 replicas, 4 zones, 10 devices, 33.33 balance
 The minimum number of hours before a partition can be reassigned is 3
 Devices:    id  zone      ip address  port      name weight partitions balance meta
            186     8      10.0.6.204  6001      sda3 100.00      19660   -0.00 
            187     8      10.0.6.204  6001      sdb3 100.00      19660   -0.00 
            194    12      10.0.6.208  6001      sda3 100.00      21845   11.11 
            195    12      10.0.6.208  6001      sdb3 100.00      21845   11.11 
            198    14      10.0.6.212  6001      sda3 100.00      21846   11.11 
            199    14      10.0.6.212  6001      sdb3 100.00      21846   11.11 
            200    15      10.0.6.213  6001      sda3 100.00      21845   11.11 
            201    15      10.0.6.213  6001      sdb3 100.00      21845   11.11 
            202     8      10.0.6.205  6001      sdn3 100.00      13108  -33.33 
            203     8      10.0.6.205  6001      sdm3 100.00      13108  -33.33 
  • and now write the rings (you don't rebalance them, because you don't actually change partitioning):
 root@ms-be11:~/swift-rings/swift# swift-ring-builder container.builder write_ring
 root@ms-be11:~/swift-rings/swift#

Nuke a swift cluster

only do this on test clusters - it is unrecoverable and destroys all the data in the cluster

  • on all servers:
    • stop all services: swift-init all stop
    • remove all ring data: rm /etc/swift/*.{builder,ring.gz}
  • on the storage nodes:
    • remove all storage content: for i in /srv/swift-storage/sd*; do rm -r $i/*& done (or just reformat the drives - faster)

The swift cluster is now destroyed. To rebuild, follow the instructions in Swift/Setup_New_Swift_Cluster

Change the origin server

No longer needed

  • in puppet
    • switch this line in /etc/swift/*.conf - thumbhost = ms5.pmtpa.wmnet to new server