Swift/How To
General Prep
Nearly all of these commands are best executed from a swift proxy host (eg ms-fe1.pmtpa.wmnet) and require either the master password or an account password. Both the master password (super_admin_key) and the specific users' passwords we have at Wikimedia are accessible in the swift proxy config file /etc/swift/proxy-server.conf
or in the private puppet repository.
All of these tasks are explained in more detail and with far more context in the official swift documentation. This page is intended as a greatly restricted version of that information directed specifically at tasks we'll need to to at WMF. For this reason I leave out many options and caveats and assume things like the authentication type (swauth) used to restrict it to what's correct for our installation. It may or may not be useful for a wider audience.
Set up an entire swift cluster
This is documented elsewhere: Swift/Setup_New_Swift_Cluster
Indivdual Commands - interacting with Swift
Create a user / account
This assumes swauth is prepped (swauth-prep
)
- generate a password: pass=$(pwgen -s 12 1)
- add the user: swauth-add-user -A http://127.0.0.1/auth/ -K $super_admin_key -a myaccount newuser password
- (swift has multiple accounts, each account has users, each user has a password)
- note - swift's user's passwords are visible (plaintext) to anybody with the $super_admin_key
- test it and retrieve the account id:
- swauth-list -A http://127.0.0.1/auth/ -K $super_admin_key myaccount
- you're looking for newuser's "account_id": "AUTH_205b4c23-6716-4a3b-91b2-5da36ce1d120"
- swauth-list -A http://127.0.0.1/auth/ -K $super_admin_key myaccount
Remove a user / account
Show account information
This assumes swauth is prepped (swauth-prep
)
The same command will show all accounts, all users within an account, or information specific to an individual user within an account, depending on the arguments passed
- show all accounts
- swauth-list -A http://127.0.0.1/auth/ -K $super_admin_key
- show all users for an account
- swauth-list -A http://127.0.0.1/auth/ -K $super_admin_key account
- show a specific user
- swauth-list -A http://127.0.0.1/auth/ -K $super_admin_key account user
Troubleshooting - if you leave off the -A you will likely get socket.error: [Errno 111] ECONNREFUSED
Get the AUTH account string
- using the swauth tool and the swauth super admin key:
- where $user is your username and $proxy is a proxy node and $key is the swauth super_admin_key:
- swauth-list -A http://$proxy/auth -K $key $user
- eg
swauth-list -A http://127.0.0.1/auth -K abcdefghijkl test
- eg
- the account AUTH string is labeled account_id and looks like AUTH_01234567-89ab-cdef-0123-456789abcdef (AUTH_8-4-4-4-12)
- using curl and an account / password pair
- where $account:$user is an account:user pair and $key is the user password and $proxy is a proxy server:
- curl -k -v -H 'X-Auth-User: $account:$user' -H 'X-Auth-Key: $key' http://$proxy/auth/v1.0
- eg
curl -k -v -H 'X-Auth-User: test:tester' -H 'X-Auth-Key: testing' http://127.0.0.1:8080/auth/v1.0
- eg
- the account AUTH string is the last part of the X-Storage-URL header
Get an authenticated session token
The session token is temporary (default of 24 hours, can be changed by the config option token_life, see [1]) and should be refetched if you are using one and get a 401 permission denied. The token is returned in the header of a GET request sent with appropriate authentication headers. The token is returned in two headers, the X-Storage-Token and X-Auth-Token. I think the X-Storage-Token is deprecated.
root@copper:/etc/swift# curl -k -v -H 'X-Auth-User: test:tester' -H 'X-Auth-Key: testing' http://127.0.0.1:8080/auth/v1.0 * About to connect() to 127.0.0.1 port 8080 (#0) * Trying 127.0.0.1... connected * Connected to 127.0.0.1 (127.0.0.1) port 8080 (#0) > GET /auth/v1.0 HTTP/1.1 > User-Agent: curl/7.19.7 (x86_64-pc-linux-gnu) libcurl/7.19.7 OpenSSL/0.9.8k zlib/1.2.3.3 libidn/1.15 > Host: 127.0.0.1:8080 > Accept: */* > X-Auth-User: test:tester > X-Auth-Key: testing > < HTTP/1.1 200 OK < X-Storage-Url: http://msfe-test.wikimedia.org:8080/v1/AUTH_854f8c66-63b6-4965-8b6c-5b2ccfe79aa8 < X-Storage-Token: AUTH_tk371f407774ef4a6580cb1c684308fb53 < X-Auth-Token: AUTH_tk371f407774ef4a6580cb1c684308fb53 < Content-Length: 126 < Date: Fri, 23 Mar 2012 23:23:54 GMT < * Connection #0 to host 127.0.0.1 left intact * Closing connection #0 {"storage": {"default": "local", "local": "http://msfe-test.wikimedia.org:8080/v1/AUTH_854f8c66-63b6-4965-8b6c-5b2ccfe79aa8"}}root@copper:/etc/swift#
Create a container
You create a container by POSTing to it. You modify a container by POSTing to an existing container. Only users with admin rights (aka users in the .admin group) are allowed to create or modify containers.
Run the following commands on any host with the swift binaries installed (any host in the swift cluster or iron)
- create a container with default permissions (r/w by owner and nobody else)
- swift -A http://ms-fe.pmtpa.wmnet/auth/v1.0 -U mw:thumb -K $pass post container-name;
- create a container with global read permissions
- swift -A http://ms-fe.pmtpa.wmnet/auth/v1.0 -U mw:thumb -K $pass post -r '.r:*' ${cont}
List containers and contents
It's easiest to do all listing from a frontend host on the cluster you wish to list. You will need the account password to do any listing.
- log into a swift frontend host on the cluster you want to use (eg ms-fe1.pmtpa.wmnet for the production pmtpa cluster)
- read the account name and key (account password) from the file /a/common/wmf-config/PrivateSettings.php on tin, looking for the $wmfSwiftConfig stanza and checking the values for 'user' and 'key'. In the examples below, substitute the key for the variable '$pass'.
list of all containers
- ask for a listing of the container:
swift -A http://127.0.0.1/auth/v1.0 -U mw:thumb -K $pass list
list the contents of one container
- ask for a listing of the container:
swift -A http://127.0.0.1/auth/v1.0 -U mw:thumb -K $pass list wikipedia-commons-local-thumb.a2
list specific objects within a container
example: look for all thumbnails for the file Little_kitten_.jpg
- start from a URL for a thumbnail (if you are at the original File: page, 'view image' on the existing thumbnail)
- Pull out the project, "language", thumb, and shard to form the correct container and add -local into the middle
- eg wikipedia-commons-local-thumb.a2
- Note - only some containers are sharded:
grep shard /etc/swift/proxy-server.conf
to find out if your container should be sharded - unsharded containers leave off the shard eg wikipedia-commons-local-thumb
- ask swift for a listing of the correct container with the --prefix option (it must come before the container name)
swift -A http://127.0.0.1/auth/v1.0 -U mw:thumb -K $pass list --prefix a/a2/Little_kit wikipedia-commons-local-thumb.a2
- note that --prefix is a substring anchored to the beginning of the shard; it doesn't have to be a complete name.
Show specific info about a container or object
Note - these instructions will only show containers or objects the account has permission to see.
- log into a swift frontend host on the cluster you want to use (eg ms-fe1.pmtpa.wmnet for the production pmtpa cluster)
- read the key (account password) from config:
pass=$(grep "^key" /etc/swift/proxy-server.conf | cut -f 3 -d\ )
- ask for statistics about all containers:
swift -A http://127.0.0.1/auth/v1.0 -U mw:thumb -K $pass stat
- ask for statistics about the container:
swift -A http://127.0.0.1/auth/v1.0 -U mw:thumb -K $pass stat wikipedia-commons-local-thumb.a2
- ask for statistics about an object in a container:
swift -A http://127.0.0.1/auth/v1.0 -U mw:thumb -K $pass stat wikipedia-commons-local-thumb.a2 a/a2/Little_kitten_.jpg/300px-Little_kitten_.jpg
Delete a container or object
Note - THIS IS DANGEROUS; it's easy to delete a container instead of an object by hitting return at the wrong time!
Deleting uses the same syntax as 'stat'. I recommend running stat on an object to get the command right then do cli substitution (^stat^delete^ in bash)
- log into a swift frontend host on the cluster you want to use (eg ms-fe1.pmtpa.wmnet for the production pmtpa cluster)
- read the key (account password) from config:
pass=$(grep "^super" /etc/swift/proxy-server.conf | cut -f 3 -d\ )
- run
swift stat
on the object you want to deleteben@ms-fe1:~$ swift -A http://127.0.0.1/auth/v1.0 -U mw:thumb -K $pass stat wikipedia-de-local-thumb f/f5/Wasserhose_1884.jpg/800px-Wasserhose_1884.jpg
- swap stat for delete in the same command.
ben@ms-fe1:~$ ^stat^delete^
When you call delete for a container it will first delete all objects within the container and then delete the container itself.
Setup temp url key on an account
MediaWiki makes use of a temporary url key to download files, the key must be set on the mw:media account. On swift machines that report statistics you can find several .env files to "su" to each account, e.g.
source /etc/swift/account_AUTH_mw.env swift post -m 'Temp-URL-Key:<your key>'
Individual Commands - Managing Swift
Show current swift ring layout
There are three rings in Swift: account, object, and container. The swift-ring-builder command with a builder file will list the current state of the ring.
- swift-ring-builder /etc/swift/account.builder
- swift-ring-builder /etc/swift/container.builder
- swift-ring-builder /etc/swift/object.builder
Rebalance the rings
You only have to rebalance the rings after you have made a change to them. If there are no changes pending, the attempt to rebalance will fail with the error message "Cowardly refusing to save rebalance as it did not change at least 1%."
To rebalance the rings you run the actual rebalance on a copy of the ring files then distribute the rings to the rest of the cluster (via puppet).
The canonical copy of the rings is kept in operations/software/swift-ring.git with instructions on how to make changes and send them for review. After a change has been reviewed and merged it can be deployed (i.e. pushed to the puppet master)
Add a proxy node to the cluster
- Update
site.pp
in puppet to make the new proxy match existing proxies in that cluster- likely you'll include role::swift::xxx-yyy::proxy
- maybe some ganglia-related stuff
- Update the xxx-yyy config section in role/swift.pp
- add the new server to the list of memcached_servers
- Run puppet on the host twice, reboot, and run puppet again
- Test the host
- curl for a file that exists in swift from both a working host and the new host
- eg:
curl -o /tmp/foo -v -H "Host: upload.wikimedia.org" http://ms-fe2.pmtpa.wmnet/wikipedia/commons/thumb/a/a2/Little_kitten_.jpg/46px-Little_kitten_.jpg
- Add the new proxy to the load balancer (full details) if it's a load balanced cluster
Remove a failed proxy node from the cluster
- Take the failed node out of the load balancer if necessary
- Update the puppet configuration for the cluster
- remove the failed node from the memcached list in the role/swift.pp in the cluster config
Add a storage node to the cluster
Start by doing the normal setup with a few tweaks, paying attention to the desired swift server layout.
Puppet will take care of all disks that are only 1 partition used for data - you should pass it all non-OS disks. You may have to create partitions on the OS disk for swift storage. The following is what I ran on ms-be1 (where the bios is on sda1 and sdb1, the OS partition is raided across 120GB partitions on sda2 and sdb2, and sda3 and sdb3 are swap):
# parted ) help ) print free ) mkpart swift-sda4 121GB 2000GB ) select /dev/sdb ) print free ) mkpart swift-sdb4 121GB 2000GB ) quit # mkfs -t xfs -i 512 -L swift-sda4 /dev/sda4 # mkfs -t xfs -i 512 -L swift-sdb4 /dev/sdb4 # mkdir /srv/swift-storage/sd{a,b}4 # chown -R swift:swift /srv/swift-storage/sd{a,b}4 # vi /etc/fstab # <-- add in a line for sda4 and sdb4 with the same xfs options as the rest # mount -a # reboot # just for good measure
After Puppet has finished setting up Swift and all device partitions are mounted successfully, add them to the rings. (Since the two partitions on sda and sdb are slightly smaller than the rest, they should get an appropriately smaller weight, eg 95 instead of 100.)
Add a device (drive) to a ring
Select the following values:
- zone : each rack is its own zone; all servers within a rack and all drives within a server should be the same zone
- list all the drives to see what zones are in use with swift-ring-builder /etc/swift/account.builder (see above)
- ip - ip of the storage node
- dev - the short name of the partition - eg 'sdc1'
- weight - how big the partition is in gigabyte (powers of 10, not 2) (e.g. 2TB -> 2000)
note see #Rebalance_the_rings on how to obtain a copy of the rings
swift-ring-builder account.builder add z${zone}-${ip}:6002/${dev} $weight swift-ring-builder container.builder add z${zone}-${ip}:6001/${dev} $weight swift-ring-builder object.builder add z${zone}-${ip}:6000/${dev} $weight
Example, to add device /dev/sda4 on ms-be5:
swift-ring-builder account.builder add z5-10.0.6.204:6002/sda4 100 swift-ring-builder container.builder add z5-10.0.6.204:6001/sda4 100 swift-ring-builder object.builder add z5-10.0.6.204:6000/sda4 100
After you're done, you must rebalance the three rings and push them out to the rest of the cluster.
Remove a failed storage node from the cluster
Remove each of the devices on the failed node from the rings, rebalance, and distribute the new ring files.
Remove (fail out) a drive from a ring
There are two conditions in which you will want to remove a device from service
- when the device is dead or the host is down and unreachable
- when it's still working but you want to decommission it or pull it out for service
For the former, you just remove the device; for the latter, you can nicely pull data off the device before shutting it off by changing the device weight first.
remove failed devices
The command to remove a device is swift-ring-builder /etc/swift/<ring>.builder remove d###
. Here's the sequence:
- find the IDs of the devices you want to remove. You're looking for the 'id' using the IP address and name as your keys. You should verify that the ID is the same across all three rings; I'm only showing one ring here for the example.
root@ms-fe2:~# swift-ring-builder /etc/swift/account.builder /etc/swift/account.builder, build version 192 65536 partitions, 3 replicas, 5 zones, 161 devices, 0.10 balance The minimum number of hours before a partition can be reassigned is 3 Devices: id zone ip address port name weight partitions balance meta 0 1 10.0.0.250 6002 sda1 25.00 844 0.02 1 1 10.0.0.250 6002 sdaa1 25.00 844 0.02 2 1 10.0.0.250 6002 sdab1 25.00 844 0.02 3 1 10.0.0.250 6002 sdad1 25.00 844 0.02 4 1 10.0.0.250 6002 sdae1 25.00 844 0.02 5 1 10.0.0.250 6002 sdaf1 25.00 844 0.02 etc. etc. etc.
- remove them (in this example I'm removing an entire host; you can remove only a single drive if necessary.) Note that in our environment, account and container device IDs often (but not always) match and object device IDs are different. You should check each ring individually.
cp -a /etc/swift ~; cd ~/swift; for i in {150..161}; do swift-ring-builder account.builder remove d$i done
- rebalance the rings and distribute them.
remove working devices for maintenance
To remove a device for maintenance, you set the weight on the device to 0, rebalance, wait a while (a day or two), then do your maintenance. The examples here assume you're removing all the devices on a node. Note that I'm only checking one of the three rings but taking action on all three. To be completely sure we should check all three rings but by policy we keep all three rings the same.
- find the IDs for the devices you want to remove (in this example, I'm pulling out ms-be5)
root@ms-fe1:/etc/swift# swift-ring-builder /etc/swift/account.builder search 10.0.6.204 Devices: id zone ip address port name weight partitions balance meta 186 8 10.0.6.204 6002 sda4 95.00 1993 -12.24 187 8 10.0.6.204 6002 sdb4 95.00 1993 -12.24 188 8 10.0.6.204 6002 sdc1 100.00 2098 -12.23 189 8 10.0.6.204 6002 sdd1 100.00 2097 -12.27 190 8 10.0.6.204 6002 sde1 100.00 2097 -12.27 191 8 10.0.6.204 6002 sdf1 100.00 2097 -12.27 192 8 10.0.6.204 6002 sdg1 100.00 2097 -12.27 193 8 10.0.6.204 6002 sdh1 100.00 2097 -12.27 194 8 10.0.6.204 6002 sdi1 100.00 2097 -12.27 195 8 10.0.6.204 6002 sdj1 100.00 2097 -12.27 196 8 10.0.6.204 6002 sdk1 100.00 2097 -12.27 197 8 10.0.6.204 6002 sdl1 100.00 2097 -12.27
- set their weight to 0
cd [your swift-ring.git checkout]/[swift instance] (e.g. eqiad-prod)
for id in {186..197}; do for ring in account object container ; do swift-ring-builder ${ring}.builder set_weight d${id} 0 done done
- Alternatively you can, for a given ring,
swift-ring-builder ${ring}.builder set_weight 10.0.6.204 0
- It will prompt you with a list of the devices that will be affected and give you a change to confirm or cancel.
- check what you've done
git diff -w
- rebalance the rings and distribute them to the rest of the cluster
Replacing a disk without touching the rings
If the time span for replacement is short enough the failed disk can be left unmounted and swapped with a working one. After successful replacement it should be added back to the raid controller and the raid cache discarded:
megacli -GetPreservedCacheList -a0 megacli -DiscardPreservedCache -L'disk_number' -a0 megacli -CfgEachDskRaid0 WB RA Direct CachedBadBBU -a0
Change info for devices in a ring
- first use the search subcommand to find the devices you want to update. Example, looking for all the devices on 10.0.6.205 with port 6002, which is wrong:
root@ms-be11:~/swift-rings/swift# swift-ring-builder container.builder search z8-10.0.6.205:6002 Devices: id zone ip address port name weight partitions balance meta 202 8 10.0.6.205 6002 sdn3 100.00 13108 -33.33 203 8 10.0.6.205 6002 sdm3 100.00 13108 -33.33 root@ms-be11:~/swift-rings/swift#
- next, if it showed you the right devices, use the set_info subcommand to replace the incorrect info. In this example, the port number is wrong, so we update it as follows:
root@ms-be11:~/swift-rings/swift# swift-ring-builder container.builder set_info z8-10.0.6.205:6002 10.0.6.205:6001 Matched more than one device: d202z8-10.0.6.205:6002/sdn3_"" d203z8-10.0.6.205:6002/sdm3_"" Are you sure you want to update the info for these 2 devices? (y/N) y Device d202z8-10.0.6.205:6002/sdn3_"" is now d202z8-10.0.6.205:6001/sdn3_"" Device d203z8-10.0.6.205:6002/sdm3_"" is now d203z8-10.0.6.205:6001/sdm3_"" root@ms-be11:~/swift-rings/swift#
- check your work:
root@ms-be11:~/swift-rings/swift# swift-ring-builder container.builder container.builder, build version 809 65536 partitions, 3 replicas, 4 zones, 10 devices, 33.33 balance The minimum number of hours before a partition can be reassigned is 3 Devices: id zone ip address port name weight partitions balance meta 186 8 10.0.6.204 6001 sda3 100.00 19660 -0.00 187 8 10.0.6.204 6001 sdb3 100.00 19660 -0.00 194 12 10.0.6.208 6001 sda3 100.00 21845 11.11 195 12 10.0.6.208 6001 sdb3 100.00 21845 11.11 198 14 10.0.6.212 6001 sda3 100.00 21846 11.11 199 14 10.0.6.212 6001 sdb3 100.00 21846 11.11 200 15 10.0.6.213 6001 sda3 100.00 21845 11.11 201 15 10.0.6.213 6001 sdb3 100.00 21845 11.11 202 8 10.0.6.205 6001 sdn3 100.00 13108 -33.33 203 8 10.0.6.205 6001 sdm3 100.00 13108 -33.33
- and now write the rings (you don't rebalance them, because you don't actually change partitioning):
root@ms-be11:~/swift-rings/swift# swift-ring-builder container.builder write_ring root@ms-be11:~/swift-rings/swift#
- now you are ready to push out the rings via puppet as described in Swift/How_To#Rebalance_the_rings.
Nuke a swift cluster
only do this on test clusters - it is unrecoverable and destroys all the data in the cluster
- on all servers:
- stop all services:
swift-init all stop
- remove all ring data:
rm /etc/swift/*.{builder,ring.gz}
- stop all services:
- on the storage nodes:
- remove all storage content:
for i in /srv/swift-storage/sd*; do rm -r $i/*& done
(or just reformat the drives - faster)
- remove all storage content:
The swift cluster is now destroyed. To rebuild, follow the instructions in Swift/Setup_New_Swift_Cluster
Change the origin server
No longer needed
- in puppet
- switch this line in /etc/swift/*.conf - thumbhost = ms5.pmtpa.wmnet to new server