How to do a schema change

From Wikitech

Whatever you do, do not run update.php on the cluster. Unless you want Roan to come slap you in the face:

RoanKattouw_away: And if anyone runs update.php I'll be on the first flight to SFO to slap them in the face

TODO: add greg-g email requesting running mwscript on machine terbium

Major Migrations

Major migrations (such as those for new releases of MediaWiki which often touch multiple large tables) are typically run via a custom script that automatically runs through every slave server defined in db.php (or db-secondary.php, to cover our backup datacenter). We've been creating a version of this for each MediaWiki release >= 1.17. It strives to be idempotent and fault tolerant. See the version used for 1.19 here: https://www.mediawiki.org/wiki/Special:Code/MediaWiki/111122.

The execution process for 1.19 was as follows:

  • mwscript maintenance/upgrade-1.19wmf1-1.php --wiki test2wiki --secondary
    • When --secondary is given, it runs to the hosts defined in db-secondary.php. These are slaves not being used by the current live site, such as in the alternate datacenter and for analytics.
  • mwscript maintenance/upgrade-1.19wmf1-1.php --wiki test2wiki
    • Runs on all slaves for all wikis (the --wiki option is required for mwscript but not used here)
  • All slaves are now updated and the master for each shard needs to be rotated. See here [Master switch] - the first procedure was used very smoothly for 1.19.
    • Run the upgrade script again after one or more masters have been switched, repeat until all are done.

Possible improvements:

  • Online Schema Changes -- If we could use an OSC tool, my current thoughts are to update all slaves with the current method, then apply to masters without switches using an OSC wrapper. OSC changes could be done on slaves but would still cause replication delays for very large tables, but cause less of an impact on masters (where writes aren't single threaded) apart from taking longer than a regular migration.
    • I've tested Percona's pt-online-schema-change tool on an eqiad slave for the revision sha1 alter and it works well but has one important schema requirement that many MW tables don't meet: a single column unique key. We have many tables with a multi-column primary key, or even no PK or unique keys.
    • Facebook's online-schema-change tool requires a PK so we also can't use it on many tables. Not sure if it supports multi-column PK's though, which would be worth testing.
    • Either way, we'd have to change the schema of a great many tables (both in core and for extensions) in order to use an OSC tool.
  • Deploy MHA to manage all of our database shards and use it to make master switches easier and faster. We'd continue to use the current schema migration pattern, but it would be easier to run and less likely to impact the site.
  • Shard large tables like revision so changes can be done on smaller pieces, possibly in parallel.
    • Sharding support in mediawiki would be a major architecture change. It should happen one day to ensure enwiki can follow sensible growth patterns.

sql.php

The sql.php script can be used to run .sql files with mwscript sql.php --wiki=enwiki patchfile.sql. This is what you'll typically want to use for MediaWiki or extension schema changes, as these .sql files contain magic comments like /*$wgDBTableOptions*/ that are expanded by sql.php.

sql command

The sql command can be used to quickly get a MySQL command line. Because different wikis' databases live on different clusters, you need to pass the database name as the first argument, like so: sql enwiki. Note that this will connect to the master database, which means you will have full write access to everything.

This means you need to be careful:

  • Check your write query and be sure you really mean it before pressing Enter
  • At the start of your session, open a new transaction with BEGIN;. After you've done your write queries, use COMMIT; to commit your changes or ROLLBACK; to undo them
  • Don't run write queries that will insert, update or delete more than 500 rows in one go. Instead, run it in batches
    • If you don't know how many rows your query will touch, find out first. This'll also help you verify you got your WHERE clause right
  • Don't run slow or complex read queries on the master. If you can, run them on the labs replicas. If this is not possible, run them on a slave. You can find a list of slaves in wmf-config/db.php.

runBatchedQuery.php

This is a fairly simple script that runs a write query over and over again until it touches zero rows. After each iteration, it waits for replication lag to go down if it's above a certain threshold (5 seconds by default) and prints a progress marker. It's designed to be used with UPDATE or DELETE queries with LIMIT 500 at the end (you need to add this yourself), which allows queries touching thousands of rows to be run without replication lag going through the roof as it would when running the query the normal way.

Usage: mwscript runBatchedQuery.php enwiki 'UPDATE ... LIMIT 500;'

Running things on all wikis

To run maintenance scripts like sql.php and runBatchedQuery.php on all wikis, you can use: foreachwiki sql.php patchfile.sql

See also