Server admin log/Archive 26

From Wikitech

December 31

  • 23:44 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1061 (duration: 00m 05s)
  • 14:02 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1065, warm up (duration: 00m 06s)
  • 09:47 godog: updating precise-wikimedia from third-party repo (hwraid)
  • 09:45 godog: previous reprepro update also accidentally updated elasticsearch in trusty-wikimedia to 1.3.7
  • 09:43 godog: updating trusty-wikimedia from third-party repo (hwraid)
  • 02:22 springle: upgrade db1065 trusty
  • 02:16 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1065 (duration: 00m 05s)
  • 02:03 awight: updated payments from 78b72063e4e0cc76b7e168be1e626d5e10e34d4a to 62c81d4574e5e994ff8f3cac7115eff335bd5265
  • 00:52 bd808: restarted elasticsearch on logstash1001
  • 00:49 awight: updated payments from e81f473acc5b31b49dd27714c40f9b71c3462e26 to 78b72063e4e0cc76b7e168be1e626d5e10e34d4a
  • 00:42 bd808: log2udp events still not making it into logstash; possibly related to earlier elasticsearch cluster issues; I don't want to restart elasticsearch on logstash1001 while the cluster is still recovering form that.
  • 00:33 bd808: restarted logstash on logstash1001; log2udp events not being recorded in elasticsearch

December 30

  • 21:52 bd808: restarted elasticsearch on logstash1002; it had dropped from the cluster
  • 20:46 logmsgbot: yurik Synchronized wmf-config/CommonSettings.php: ZeroPortal 182227 (duration: 00m 06s)
  • 19:06 paravoid: manually stopping acct on neon and setting /etc/default/acct ACCT_ENABLE to 0
  • 16:38 godog: killing uwsgi on tunsten, blew memory
  • 14:46 Nemo_bis: morebots is being rude today
  • 14:36 logmsgbot: hoo Synchronized wmf-config/CommonSettings.php: Enable unregistered users editing on it.m.wikipedia.org after Dec 31 (duration: 00m 06s)

December 29

  • 20:19 awight: payments updated from ce7fb9af37c4bba2a84668387b61729df4f9723c to e81f473acc5b31b49dd27714c40f9b71c3462e26
  • 10:35 godog: reboot ms-be2011, stuck while removing a LD, no console

December 27

  • 23:33 paravoid: restarting puppetmasters
  • 20:29 gwicke: dropped old keyspaces titan{,2,3} on xenon to free space for titan4
  • 19:53 ori: gallium: restarted jenkins
  • 16:19 Reedy: jenkins started again...
  • 16:17 Reedy: jenkins killed
  • 16:12 Reedy: attempting to kill jenkins
  • 16:11 Reedy: jenkins is hung with high cpu/memory usage
  • 12:55 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: bump up s1 api load sent to db1066 (duration: 00m 06s)
  • 12:11 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1066 and db1028, warm up (duration: 00m 06s)
  • 05:49 ori: cerium disk space critical, so moved /mnt/data/cassandra/java_{1418354329,1418533386,1418537719}.hprof to /tmp/hprof_files, freeing up ~17G of space.

December 25

  • 20:44 _joe_: restarting hhvm on mw1239, stuck in HPHP::is_valid_var_name probably after trying to call ini_set
  • 00:32 logmsgbot: hoo Synchronized wmf-config/Bug54847.php: Fix for invalid hashes (this prevented some people from logging in) (duration: 00m 05s)
  • 00:26 logmsgbot: spage Synchronized php-1.25wmf12/extensions/PageTriage/modules/ext.pageTriage.views.toolbar/ext.pageTriage.delete.js: Unbreak page curation on enwiki for Xmas (duration: 00m 05s)
  • 00:20 logmsgbot: spage Synchronized php-1.25wmf13/extensions/PageTriage/modules/ext.pageTriage.views.toolbar/ext.pageTriage.delete.js: Unbreak page curation (duration: 00m 06s)

December 24

  • 18:42 paravoid: manually running debmirror on carbon to sync over the holidays; "pkill -f debmirror" should suffice if there is a problem
  • 14:57 akosiaris: disabled puppet on helium while testing copy jobs
  • 13:59 Jeff_Green: package updates and reboots for several fundraising servers...
  • 01:17 hoo: Ran mysql:wikiadmin@db1033 [centralauth]> DELETE FROM bug_54847_password_resets WHERE r_username = 'Stilfehler';

December 23

  • 22:21 logmsgbot: hoo Synchronized wmf-config/: Syncing Kaldari's beta-only change (duration: 00m 07s)
  • 20:34 K4-713: Re-enabled Thank You send job
  • 19:53 logmsgbot: anomie Synchronized wmf-config: Labs-only change (duration: 00m 06s)
  • 19:43 K4-713: disabled Thank You email send job
  • 19:40 chasemp: updated phab sprint app to 0.6.1.4
  • 18:32 paravoid: restarting icing
  • 15:06 _joe_: gracefully reloading apache on palladium to clean up old puppet master instances
  • 14:50 _joe_: restarted apache on strontium to verify hiera is working
  • 14:17 godog: restart icinga on neon
  • 08:30 _joe_: restarting gitblit, stuck at 100% cpu on a thread
  • 03:25 andrewbogott: graceful'd apache2 on virt1000 (same intermittent passenger crash as always)

December 22

  • 23:43 springle: xtrabackup clone db2010 to db2030
  • 21:26 legoktm: ran delete from localnames where ln_name="Nonoh" and ln_wiki="ruwiki" limit 1; on centralauth for https://phabricator.wikimedia.org/T85041
  • 20:02 awight: update paments from 3dde7be76284aa37b74038dfb4473671999dfcff to ce7fb9af37c4bba2a84668387b61729df4f9723c
  • 19:53 awight: deployed CentralNotice RecordImpression logging for hide cookie bug
  • 19:53 logmsgbot: awight Synchronized php-1.25wmf13/extensions/CentralNotice: RecordImpression logging for CentralNotice hide cookie bug (duration: 00m 06s)
  • 19:53 logmsgbot: awight Synchronized php-1.25wmf12/extensions/CentralNotice: RecordImpression logging for CentralNotice hide cookie bug (duration: 00m 06s)
  • 19:04 anomie: deployed T85113
  • 18:59 YuviPanda: running sync-common on virt1000
  • 17:42 cmjohnson1: taking neon down again to reseat idrac nic card
  • 17:04 cmjohnson1: powering down neon (icinga) to drain flea power and reset idrac
  • 16:37 _joe_: uploading java8 packages for trusty

December 21

December 20

  • 20:45 qchris: restarted webperf service statsd-mw-js-deprecate on hafnium. It seems it did not send metrics to statsd after an EventLogging restart.
  • 02:59 logmsgbot: mattflaschen Synchronized php-1.25wmf13/resources/src/jquery.tipsy/jquery.tipsy.js: Fix "live" deprecated live mode of jQuery tipsy (duration: 00m 05s)
  • 02:59 logmsgbot: mattflaschen Synchronized php-1.25wmf13/resources/src/jquery.tipsy/jquery.tipsy.js: Fix "live" deprecated live mode of jQuery tipsy (duration: 00m 05s)
  • 02:59 logmsgbot: mattflaschen Synchronized php-1.25wmf12/resources/src/jquery.tipsy/jquery.tipsy.js: Fix "live" deprecated live mode of jQuery tipsy (duration: 00m 05s)
  • 02:57 logmsgbot: mattflaschen Synchronized php-1.25wmf13/extensions/PageTriage/modules/ext.pageTriage.views.toolbar/: Fix to PageTriage not to use jQuery live (duration: 00m 07s)
  • 02:57 logmsgbot: mattflaschen Synchronized php-1.25wmf12/extensions/PageTriage/modules/ext.pageTriage.views.toolbar/: Fix to PageTriage not to use jQuery live (duration: 00m 05s)
  • 01:18 awight: payments rolled back to 3dde7be76284aa37b74038dfb4473671999dfcff
  • 00:57 awight: payments updated from 3dde7be76284aa37b74038dfb4473671999dfcff to ce7fb9af37c4bba2a84668387b61729df4f9723c
  • 00:35 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1072 warm up. depool db1066 replication error (duration: 00m 05s)

December 19

  • 23:39 awight: payments rolled back ab93b636fae7bcb38a155c019ad102f3b071918c --> 3dde7be76284aa37b74038dfb4473671999dfcff
  • 23:28 awight: payments updated from 3dde7be76284aa37b74038dfb4473671999dfcff to ab93b636fae7bcb38a155c019ad102f3b071918c
  • 23:23 awight: rollback payments to 3dde7be76284aa37b74038dfb4473671999dfcff
  • 23:18 awight: updated payments from 3dde7be76284aa37b74038dfb4473671999dfcff to ab93b636fae7bcb38a155c019ad102f3b071918c
  • 21:27 awight: update crm from ae7b2381667dd65d68812c58f61e3ea66fa9fa6f to 80241fd2a43f03796b416d728661470f875a590a
  • 17:54 hoo: Manually transferred the email from enwiki account "Hob Gadling" to the centralauth account of the same name (after a partially failed account creation).
  • 12:54 logmsgbot: aude Synchronized php-1.25wmf12/extensions/Wikidata/extensions/Wikibase/lib/resources/jquery.wikibase: js caching issues (duration: 00m 05s)
  • 07:15 andrewbogott: disabled puppet and nova-compute on virt1010 and virt1011 until I can sort out a libvirt issue.
  • 06:55 _joe_: restarted HHVM on mw1184, stuck in HPHP::StatCache::refresh
  • 03:36 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1028 (duration: 00m 06s)
  • 02:37 logmsgbot: yurik Synchronized php-1.25wmf13/extensions/ZeroPortal: (no message) (duration: 00m 05s)
  • 02:36 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1055, warm up (duration: 00m 06s)
  • 02:35 Jeff_Green: pay-lvs1001inadvertently power cycled
  • 02:19 logmsgbot: ori Synchronized wmf-config/StartProfiler.php: re-enable xenon (duration: 00m 06s)
  • 01:56 logmsgbot: yurik Synchronized php-1.25wmf13/extensions/ZeroPortal: (no message) (duration: 00m 06s)
  • 01:21 logmsgbot: maxsem Synchronized php-1.25wmf13/extensions/VisualEditor/: (no message) (duration: 00m 07s)
  • 01:21 logmsgbot: maxsem Synchronized php-1.25wmf13/extensions/MobileFrontend/: (no message) (duration: 00m 05s)
  • 01:13 logmsgbot: maxsem Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/180818 (duration: 00m 05s)
  • 00:57 logmsgbot: maxsem Synchronized php-1.25wmf12/extensions/VisualEditor/: https://gerrit.wikimedia.org/r/#/c/180860/ (duration: 00m 07s)
  • 00:56 logmsgbot: maxsem Synchronized php-1.25wmf12/resources/lib/oojs-ui/oojs-ui.js: https://gerrit.wikimedia.org/r/#/c/180860/ (duration: 00m 08s)
  • 00:53 logmsgbot: demon Synchronized php-1.25wmf13/includes/Html.php: (no message) (duration: 00m 05s)
  • 00:30 logmsgbot: maxsem Synchronized php-1.25wmf12/extensions/Wikidata/: https://gerrit.wikimedia.org/r/#q,181000,n,z (duration: 00m 13s)
  • 00:28 logmsgbot: maxsem Synchronized php-1.25wmf13/extensions/Wikidata/: https://gerrit.wikimedia.org/r/#q,181003,n,z (duration: 00m 12s)
  • 00:18 logmsgbot: maxsem Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/180867 (duration: 00m 06s)
  • 00:05 Krinkle: Reloading Zuul to deploy I3333f5e45

December 18

  • 23:35 logmsgbot: mattflaschen Finished scap: Deploy changes to Flow to fix preview (both branches) and add commit metadata (1.25wmf13) (duration: 26m 33s)
  • 23:09 logmsgbot: mattflaschen Started scap: Deploy changes to Flow to fix preview (both branches) and add commit metadata (1.25wmf13)
  • 22:54 logmsgbot: demon Synchronized wmf-config/StartProfiler.php: fix profiling again, this time with feeling (duration: 00m 08s)
  • 22:19 logmsgbot: ori Synchronized php-1.25wmf13/includes/parser/MWTidy.php: I7e67a61f7: Revert "Simplify MWTidy" (duration: 00m 05s)
  • 22:19 logmsgbot: ori Synchronized php-1.25wmf12/includes/parser/MWTidy.php: I03cc1f46f: Revert "Simplify MWTidy" (duration: 00m 14s)
  • 20:39 logmsgbot: bd808 Synchronized php-1.25wmf13/includes/profiler/ProfilerXhprof.php: xhprof: backport section profiler fixes (duration: 00m 07s)
  • 20:14 logmsgbot: bd808 Synchronized php-1.25wmf13/tests/phpunit/includes/api/format/ApiFormatWddxTest.php: Skip ApiFormatWddxTest under HHVM (duration: 00m 07s)
  • 19:11 logmsgbot: bd808 Synchronized php-1.25wmf12/includes/profiler/ProfilerXhprof.php: backport section profiler fixes [I5935ee2] (duration: 00m 05s)
  • 18:52 logmsgbot: bd808 Synchronized php-1.25wmf12/includes/utils/IP.php: Log calls to IP::parseRange with invalid array argument [Ie883eb6] (duration: 00m 05s)
  • 18:42 logmsgbot: bd808 Synchronized php-1.25wmf12/tests/phpunit/includes/api/format/ApiFormatWddxTest.php: syncing test fix Ia58ec20 (duration: 00m 06s)
  • 18:25 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: logging for IP argument bugs (duration: 00m 05s)
  • 18:09 bblack: analytics vlan ACLs updated in eqiad
  • 17:59 logmsgbot: demon Synchronized wmf-config/: Disable lsearchd almost everywhere (duration: 00m 07s)
  • 17:42 logmsgbot: demon Synchronized wmf-config/: Remove cirrus-as-alternate settings (duration: 00m 06s)
  • 17:22 bblack: switched to new unified cert on all nginx terminators via config reload
  • 17:18 godog: enabling md write intent bitmap temporarily on virt1009
  • 17:08 hashar: Made mediawiki-phpunit-hhvm Jenkins job voting. We now enforce HHVM compliance for mediawiki/core
  • 16:37 hashar: gallium deleting obsoletes jobs: rm -fR /srv/ssd/jenkins-slave/workspace/*-testextension . They are now suffixed with -zend and -hhvm
  • 16:35 godog: deleting jenkins workspaces on lanthanum older than 30d
  • 16:32 hashar: lanthanum deleting obsoletes jobs: rm -fR /srv/ssd/jenkins-slave/workspace/*-testextension . They are now suffixed with -zend and -hhvm
  • 16:24 logmsgbot: demon Synchronized wmf-config/StartProfiler.php: disable normal eqiad profiling (duration: 00m 06s)
  • 16:13 logmsgbot: manybubbles Synchronized php-1.25wmf12/extensions/MultimediaViewer/: SWAT backport last-modified performance logging for mediaviewer (duration: 00m 05s)
  • 16:06 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: SWAT disable thumbnail caching (duration: 00m 05s)
  • 16:06 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings-labs.php: SWAT disable thumbnail caching (duration: 00m 05s)
  • 15:43 _joe_: rebooted mw1191
  • 15:07 logmsgbot: demon Synchronized wmf-config/StartProfiler.php: disable xhprof-backed flame graphs for now (duration: 00m 05s)
  • 14:57 _joe_: rebooting mw1257
  • 14:44 _joe_: restarting hhvm on mw1191
  • 14:01 qchris: EventLogging: deployed 937d804 & restarted EventLogging
  • 13:54 akosiaris: purged unpuppetized rrdcached from hafnium. It was segfaulting when started via the init script, which led to the package being unconfigured which led to dpkg alerts on icinga
  • 13:42 _joe|lunch: restarting hhvm on a few servers
  • 13:11 _joe_: restarted hhvm on mw1242, stuck in getrusage()
  • 13:03 _joe_: restarted hhvm on mw1191, load at 200
  • 13:00 paravoid: salt-cleaning up /etc/sudoers.d/50_* (old naming scheme)
  • 12:04 godog: upload carbon-c-relay 0.36+git20141218-1 to trusty-wikimedia
  • 09:13 hashar: enabled MediaWiki core 'structure' PHPUnit tests for all extensions. Will require folks to fix their incorrect AutoLoader and RessourceLoader entries. 180496 bug T78798
  • 06:23 _joe|justawake: restarted the puppetmaster on palladium
  • 04:35 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1055 (duration: 04m 19s)
  • 01:42 logmsgbot: ori Synchronized php-1.25wmf12/resources/src/mediawiki.action/mediawiki.action.edit.stash.js: Ibb29a825c: mediawiki.action.edit.stash: set timeout to 4 seconds (duration: 00m 05s)
  • 01:31 awight: update crm from f1e558592ee98ff8fc84d19ff2c0435619e11242 to ae7b2381667dd65d68812c58f61e3ea66fa9fa6f
  • 01:25 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1072 (duration: 00m 08s)
  • 01:22 springle: mw1191 restarted hhvm, apparently stuck in futex
  • 00:38 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1066, warm up (duration: 00m 05s)
  • 00:24 hashar: Restarting Jenkins to remove a deadlock on deployment-bastion slave
  • 00:19 logmsgbot: maxsem Synchronized php-1.25wmf13/extensions/Flow/: https://gerrit.wikimedia.org/r/#/c/180677/ (duration: 00m 08s)
  • 00:01 logmsgbot: maxsem Synchronized wmf-config/: https://gerrit.wikimedia.org/r/#/c/180376/ - no-op for prod (duration: 00m 06s)

December 17

  • 23:34 hashar: Restarted Jenkins and Zuul again to have a clean start while I am crashing to bed.
  • 23:22 logmsgbot: demon Synchronized wmf-config/StartProfiler.php: xhprof on all hhvm hosts in eqiad (duration: 00m 05s)
  • 22:46 hashar: restarting Jenkins
  • 21:45 hashar: killing Jenkins
  • 21:41 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.25wmf13
  • 21:40 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias to 1.25wmf12
  • 21:38 logmsgbot: reedy Finished scap: testwiki to 1.25wmf13 and build l10n cache (duration: 12m 26s)
  • 21:25 logmsgbot: reedy Started scap: testwiki to 1.25wmf13 and build l10n cache
  • 21:24 Reedy: @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @ for mw1152
  • 20:36 hashar: Jenkins/Zuul had some deadlock. Disconnected/reconnected slaves but that did not fix it. Finally had to disconnect/reconnect thegearman client in Jenkins and it is processing again.
  • 20:36 logmsgbot: reedy Started scap: testwiki to 1.25wmf13 and build l10n cache
  • 20:12 hashar: Jenkins some slaves are no more properly registered. Unpooling / Repooling them
  • 16:42 logmsgbot: demon Synchronized php-1.25wmf12/extensions/TextExtracts/: (no message) (duration: 00m 05s)
  • 16:27 logmsgbot: demon Synchronized wmf-config/CirrusSearch-labs.php: for completeness (duration: 00m 05s)
  • 16:20 ^d: mw1190: manually ran sync-common since it was yelling about my key earlier
  • 16:14 logmsgbot: demon Synchronized php-1.25wmf12/includes/specials/SpecialSearch.php: (no message) (duration: 00m 06s)
  • 16:10 logmsgbot: demon Synchronized php-1.25wmf12/extensions/Wikidata/: (no message) (duration: 00m 12s)
  • 14:52 akosiaris: uploaded apertium-nno-nob_1.0.0+svn~57977-1 to apt.wikimedia.org
  • 14:43 anomie: Merged and fetched gerrit:180477, so undeployed bad extension changes from gerrit:180229 are no longer a danger
  • 13:47 akosiaris: uploaded apertium-nob_0.1.0+svn~58076-1 and apertium-nno_0.1.0+svn~58076-1 to apt.wikimedia.org
  • 11:59 _joe_: removing some core dumps from appservers, so that we don't run out of space by tomorrow
  • 11:52 Nemo_bis: Don't sync extensions, undeployed unintentional reverts https://wikitech.wikimedia.org/?diff=138472&oldid=138399
  • 10:54 hashar: Jenkins deleting legacy 'mwext*testextension' jobs (now suffixed with '-zend') and restarting Jenkins.
  • 10:40 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1064, warm up (duration: 00m 05s)
  • 10:30 andrewbogott: virt1010 and 1011 are up but with puppet and nova-compute disabled pending firewall issues
  • 10:29 hashar: mw1152 is a jobrunner being rebuild
  • 10:26 hashar: mw1152 has a wrong host key in /etc/ssh/ssh_known_hosts:2480 causing scap to spurts a remote identification error.
  • 10:26 logmsgbot: hashar Synchronized wmf-config/throttle.php: 180429 - Throttle rule for University of Haifa event (duration: 00m 06s)
  • 10:25 logmsgbot: hashar Synchronized wmf-config/throttle.php: 180429 - Throttle rule for University of Haifa event (duration: 00m 06s)
  • 09:57 _joe_: jobrunner started on mw1152
  • 08:43 _joe_: depooling mw1152, reimaging as an HAT jobrunner
  • 07:52 godog: increase minimum raid reconstruction speed on virt1005 and virt1009
  • 06:52 springle: upgrade db1064 trusty
  • 06:14 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1064 (duration: 00m 06s)
  • 04:39 springle: mw1015 sync-common
  • 04:21 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1066 (duration: 00m 05s)
  • 03:11 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1073, warm up (duration: 00m 06s)
  • 02:48 ori: restarted jobrunner on jobrunners
  • 02:42 logmsgbot: ori Synchronized php-1.25wmf12/includes/parser/MWTidy.php: I4909e5e20: use stream_select() to get external tidy stdout/stderr (uncommitted; pending review) (duration: 00m 33s)
  • 01:01 logmsgbot: maxsem Synchronized php-1.25wmf12/extensions/Wikidata/: https://gerrit.wikimedia.org/r/180368 (duration: 00m 59s)
  • 00:59 logmsgbot: maxsem Synchronized php-1.25wmf12/extensions/Flow/: https://gerrit.wikimedia.org/r/#/c/180303/ (duration: 00m 41s)
  • 00:56 logmsgbot: maxsem Synchronized php-1.25wmf12/includes/: https://gerrit.wikimedia.org/r/#/c/180214/ part 2 (duration: 01m 38s)
  • 00:54 logmsgbot: maxsem Synchronized php-1.25wmf12/autoload.php: https://gerrit.wikimedia.org/r/#/c/180214/ part 1 (duration: 00m 26s)
  • 00:51 logmsgbot: maxsem Synchronized wmf-config/: (no message) (duration: 00m 52s)
  • 00:33 logmsgbot: maxsem Synchronized wmf-config/Wikibase.php: https://gerrit.wikimedia.org/r/179469 (duration: 01m 22s)
  • 00:24 logmsgbot: tstarling Synchronized php-1.25wmf12/extensions/SecurePoll/includes/crypt/Crypt.php: tallying fix (duration: 01m 04s)
  • 00:15 bblack: killed runJobs procs on mw1015 with init as parent

December 16

  • 23:55 Tim: fixed MW cgroup on tin
  • 23:47 logmsgbot: tstarling Synchronized wmf-config/CommonSettings.php: SecurePoll debugging (duration: 01m 01s)
  • 23:32 logmsgbot: legoktm Synchronized wmf-config/CommonSettings.php: Temporarily disable wgCentralAuthAutoMigrate (duration: 01m 17s)
  • 23:31 ori: disabled puppet on tin and removed mw1015 from mediawiki-installation dsh group
  • 22:34 hoo: Updated the Wikidata property suggester with data from Monday's JSON dump
  • 20:43 Jamesofur: inserted decryption key for English Wikipedia Arbitration Committee Election (2014)
  • 20:35 twentyafterfour: spam
  • 19:50 logmsgbot: twentyafterfour Synchronized wmf-config/Wikibase.php: (no message) (duration: 00m 05s)
  • 19:41 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group1 to 1.25wmf12
  • 19:19 Krinkle: Reloading Zuul to deploy Id2cfcdfd56220
  • 19:18 awight: update crm from 28b68e23b670fe52a401659bde800b64d05e25bf to f1e558592ee98ff8fc84d19ff2c0435619e11242
  • 19:15 logmsgbot: catrope Synchronized php-1.25wmf12/skins/Vector/: Revert watch star change (duration: 00m 05s)
  • 17:30 legoktm: deleted apparently invalid timecorrection preference for user_id=68157 on simplewiki
  • 16:44 logmsgbot: anomie Synchronized php-1.25wmf12/extensions/Translate: SWAT: Translate: Revert "Request csrf tokens in JS when supported" gerrit:180201 (duration: 01m 06s)
  • 16:31 cmjohnson: removing mw1192 from pybal and disabling puppet for hardware troubleshooting
  • 16:22 logmsgbot: anomie Synchronized php-1.25wmf12/extensions/Wikidata/: SWAT: extensions/Wikidata to 9d03a1df13ede425673da9ce57c440b59e867aa6 gerrit:180184 (duration: 00m 21s)
  • 16:01 logmsgbot: anomie Synchronized wmf-config: SWAT: Enable $wgExtractsExtendOpenSearchXml gerrit:179168 (duration: 00m 07s)
  • 15:58 _joe_: load test done, the apache appserver pool can work flawlessly with 110 servers in the pool
  • 15:00 _joe_: depooling part of the apache appserver pool to assess current load
  • 13:20 mutante: started lighttpd on sodium
  • 09:31 godog: upgrade diamond in trusty/eqiad
  • 09:29 godog: upgrade diamond in trusty/esams
  • 08:42 godog: upgrading trusty/codfw to diamond 3.5-3
  • 07:05 springle: upgrade db1073 trusty
  • 06:39 springle: 06:32 < springle> !log sync-common on mw1043 after sync-file fail
  • 06:38 springle: <+logmsgbot> !log springle Synchronized wmf-config/db-eqiad.php: depool db1073 (duration: 00m 06s)
  • 03:46 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Dec 16 03:46:47 UTC 2014 (duration 46m 46s)
  • 02:37 springle: upgrade db1055 trusty
  • 02:24 logmsgbot: LocalisationUpdate completed (1.25wmf12) at 2014-12-16 02:23:58+00:00
  • 02:23 logmsgbot: l10nupdate Synchronized php-1.25wmf12/cache/l10n: (no message) (duration: 00m 02s)
  • 02:12 logmsgbot: LocalisationUpdate completed (1.25wmf11) at 2014-12-16 02:12:20+00:00
  • 02:12 logmsgbot: l10nupdate Synchronized php-1.25wmf11/cache/l10n: (no message) (duration: 00m 01s)
  • 02:06 logmsgbot: aude Finished scap: Update test.wikidata (duration: 18m 38s)
  • 01:47 logmsgbot: aude Started scap: Update test.wikidata
  • 01:36 logmsgbot: yurik Synchronized php-1.25wmf12/extensions/ZeroBanner: (no message) (duration: 00m 05s)
  • 01:34 logmsgbot: yurik Synchronized php-1.25wmf11/extensions/ZeroBanner: (no message) (duration: 00m 09s)
  • 01:30 logmsgbot: yurik Synchronized php-1.25wmf12/extensions/ZeroPortal: (no message) (duration: 00m 07s)
  • 01:20 logmsgbot: yurik Synchronized php-1.25wmf11/extensions/ZeroBanner: (no message) (duration: 00m 06s)
  • 01:15 logmsgbot: maxsem Finished scap: (no message) (duration: 26m 35s)
  • 00:49 logmsgbot: maxsem Started scap: (no message)
  • 00:37 logmsgbot: maxsem Synchronized php-1.25wmf12/extensions/Flow/: https://gerrit.wikimedia.org/r/#/c/180043/ (duration: 00m 08s)
  • 00:34 awight: payments updated from f3fd79aaaf730f8fd18a72f83c11e9cc111a0aab to 3dde7be76284aa37b74038dfb4473671999dfcff
  • 00:29 logmsgbot: maxsem Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/179981 (duration: 00m 07s)
  • 00:12 logmsgbot: maxsem Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/179513 (duration: 00m 06s)

December 15

  • 23:43 awight: update payments from 21afbcf24c0e2124f783cc3c2c65621569675d6f to f3fd79aaaf730f8fd18a72f83c11e9cc111a0aab
  • 23:18 csteipp: deploy patch for T71209
  • 23:18 csteipp: redeploy patches for T77624 & T76195
  • 20:23 YuviPanda: cleaned out /var/log/wikidatadumps on snapshot1003 because hoo needs them anywhay?
  • 18:27 logmsgbot: bd808 Synchronized wmf-config/CommonSettings.php: Set wgTranslateTranslationServices['TTMServer']['cutoff'] [I138b22a] (duration: 00m 07s)
  • 18:10 logmsgbot: bd808 Synchronized wmf-config/InitialiseSettings.php: Sample the GlobalTitleFail log at 1:10000 [I280ac3d] (duration: 00m 07s)
  • 17:37 logmsgbot: bd808 Synchronized wmf-config/InitialiseSettings.php: Enable MWLoggerMonologSpi for group0 wikis [I2f72f97] (duration: 00m 05s)
  • 17:26 logmsgbot: bd808 Synchronized wmf-config/InitialiseSettings.php: Enable MWLoggerMonologSpi for testwiki [I419eb0d] (duration: 00m 05s)
  • 17:23 logmsgbot: bd808 Synchronized docroot/noc/createTxtFileSymlinks.sh: Optional MWLoggerMonologSpi configuration [I720f2cb] (for real this time) (duration: 00m 06s)
  • 17:22 logmsgbot: bd808 Synchronized wmf-config: Optional MWLoggerMonologSpi configuration [I720f2cb] (for real this time) (duration: 00m 06s)
  • 17:14 logmsgbot: bd808 Synchronized docroot/noc/createTxtFileSymlinks.sh: Optional MWLoggerMonologSpi configuration [I720f2cb] (duration: 00m 05s)
  • 17:13 logmsgbot: bd808 Synchronized wmf-config: Optional MWLoggerMonologSpi configuration [I720f2cb] (duration: 00m 05s)
  • 17:10 logmsgbot: bd808 Synchronized wmf-config/InitialiseSettings.php: Introduce wmgUseMonologLogger feature flag [I61fa967] (duration: 00m 07s)
  • 16:33 logmsgbot: marktraceur Synchronized php-1.25wmf12/extensions/UploadWizard/: [SWAT] [wmf12] Fix Flickr imports in UploadWizard (duration: 00m 05s)
  • 16:30 logmsgbot: marktraceur Synchronized php-1.25wmf11/extensions/UploadWizard/: [SWAT] [wmf11] Fix Flickr imports in UploadWizard (duration: 00m 05s)
  • 16:21 logmsgbot: marktraceur Synchronized php-1.25wmf11/extensions/MultimediaViewer/: [SWAT] [wmf11] - Track the most recent upload time for performance events (Media Viewer) (duration: 00m 05s)
  • 16:12 logmsgbot: marktraceur Synchronized php-1.25wmf12/extensions/Wikidata/: [SWAT] [wmf12] - Update test.wikidata (fixes/polish for changes to the site link section, and performance improvements for page views). (duration: 00m 24s)
  • 15:31 godog: upload diamond 3.5-3 to trusty-wikimedia
  • 14:01 godog: reinstall python-twisted-bin python-twisted-core python-twisted-web on labmon1001
  • 14:00 robh: zinc removed from icinga, system is now shutdown for reclaim per RT8939
  • 13:50 robh: reclaiming zinc to spares, stopped puppet agent
  • 13:13 akosiaris: uploaded hfst_3.8.1~r4088-1 to apt.wikimedia.org (trusty)
  • 11:51 hashar: Zuul: clearing out some old zuul git references ( https://phabricator.wikimedia.org/T70481 ). Running in a screen on gallium
  • 09:48 hashar: Upgrading composer on CI to v1.0.0-alpha9 178550
  • 06:57 logmsgbot: ori Synchronized wmf-config/InitialiseSettings.php: Id9023e66c: Sample "api" debug log group at 1:1000 (duration: 00m 06s)
  • 03:36 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Dec 15 03:36:24 UTC 2014 (duration 36m 23s)
  • 02:15 logmsgbot: LocalisationUpdate completed (1.25wmf12) at 2014-12-15 02:15:01+00:00
  • 02:15 logmsgbot: l10nupdate Synchronized php-1.25wmf12/cache/l10n: (no message) (duration: 00m 01s)
  • 02:09 logmsgbot: LocalisationUpdate completed (1.25wmf11) at 2014-12-15 02:09:21+00:00
  • 02:09 logmsgbot: l10nupdate Synchronized php-1.25wmf11/cache/l10n: (no message) (duration: 00m 02s)

December 14

  • 03:33 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Dec 14 03:33:14 UTC 2014 (duration 33m 13s)
  • 02:16 logmsgbot: LocalisationUpdate completed (1.25wmf12) at 2014-12-14 02:16:31+00:00
  • 02:16 logmsgbot: l10nupdate Synchronized php-1.25wmf12/cache/l10n: (no message) (duration: 00m 01s)
  • 02:11 logmsgbot: LocalisationUpdate completed (1.25wmf11) at 2014-12-14 02:11:43+00:00
  • 02:11 logmsgbot: l10nupdate Synchronized php-1.25wmf11/cache/l10n: (no message) (duration: 00m 02s)

December 13

  • 12:20 andrewbogott: graceful'd apache2 on virt1000; puppet master was acting up.
  • 09:51 hashar: Restarting Jenkins to get rid of some deadlocks that occurred yesterday
  • 03:29 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Dec 13 03:29:31 UTC 2014 (duration 29m 30s)
  • 02:14 logmsgbot: LocalisationUpdate completed (1.25wmf12) at 2014-12-13 02:14:00+00:00
  • 02:14 logmsgbot: l10nupdate Synchronized php-1.25wmf12/cache/l10n: (no message) (duration: 00m 01s)
  • 02:09 logmsgbot: LocalisationUpdate completed (1.25wmf11) at 2014-12-13 02:09:28+00:00
  • 02:09 logmsgbot: l10nupdate Synchronized php-1.25wmf11/cache/l10n: (no message) (duration: 00m 01s)
  • 02:04 logmsgbot: krinkle Synchronized w/robots.php: 54746fdef3402 (duration: 00m 05s)
  • 00:58 logmsgbot: krinkle Synchronized w/robots.php: 611892c62349d09c9758 (duration: 00m 06s)

December 12

  • 22:06 logmsgbot: ori Synchronized wmf-config/StartProfiler.php: (no message) (duration: 00m 06s)
  • 19:47 ottomata1: initiating kafka preferred-replica-election to bring analytics1021 back in to leadership :/ need to figure this out, or replace this node soon.
  • 18:15 YuviPanda: ran sudo logrotate -f /etc/logrotate.d/dumpwikidatajson on snapshot1003 forhoo
  • 18:13 logmsgbot: ori Synchronized wmf-config/StartProfiler.php: (no message) (duration: 00m 08s)
  • 16:42 akosiaris: uploaded apertium-sv-da, apertium-en-ca to apt.wikimedia.org
  • 15:13 hashar: Zuul Reverting Zuul back to wmf-deploy-20141030-4 . I previously reverted it to another change which was wrong.
  • 14:59 hashar: Zuul status page is no more. https://phabricator.wikimedia.org/T78400
  • 14:50 hashar: upgrading python-statsd on Zuul server and restarting service.
  • 14:37 godog: upload python-statsd 3.0.1-1 to precise-wikimedia
  • 14:13 godog: upload python-statsd 3.0.1-1 to trusty-wikimedia
  • 11:43 YuviPanda: force puppet run on all labs hosts via salt
  • 09:33 ori: restarted mwprof on tungsten
  • 09:20 logmsgbot: ori Synchronized wmf-config/StartProfiler.php: I63864cc79: xenon log: collate stack samples and fold into single lines (duration: 00m 06s)
  • 06:05 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1055 (duration: 00m 05s)
  • 03:50 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Dec 12 03:50:10 UTC 2014 (duration 50m 9s)
  • 03:28 logmsgbot: yurik Synchronized php-1.25wmf12/extensions/ZeroPortal/: updatidng ZeroPortal to master - urgent bugfix - retry (duration: 00m 10s)
  • 03:27 logmsgbot: yurik Synchronized php-1.25wmf12/extensions/ZeroPortal/: updatidng ZeroPortal to master - urgent bugfix (duration: 00m 05s)
  • 03:25 logmsgbot: ori Synchronized wmf-config: I1d218c2d6: Log xenon-captured traces via wfDebugLog (duration: 00m 06s)
  • 02:55 andrewbogott: rebooted mw1041 from mgmt
  • 02:20 logmsgbot: LocalisationUpdate completed (1.25wmf12) at 2014-12-12 02:20:37+00:00
  • 02:20 logmsgbot: l10nupdate Synchronized php-1.25wmf12/cache/l10n: (no message) (duration: 00m 03s)
  • 02:15 logmsgbot: LocalisationUpdate completed (1.25wmf11) at 2014-12-12 02:15:34+00:00
  • 02:15 logmsgbot: l10nupdate Synchronized php-1.25wmf11/cache/l10n: (no message) (duration: 00m 03s)
  • 00:41 logmsgbot: maxsem Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/179360 (duration: 00m 06s)
  • 00:35 godog: profiler-to-carbon is logging too much on tungsten, cause unknown yet but don't restart
  • 00:30 logmsgbot: maxsem Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/179359 (duration: 00m 11s)
  • 00:25 logmsgbot: maxsem Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/179341/ (duration: 00m 05s)
  • 00:17 logmsgbot: maxsem Synchronized php-1.25wmf11/extensions/MobileFrontend: (no message) (duration: 00m 06s)
  • 00:17 logmsgbot: maxsem Synchronized php-1.25wmf11/extensions/WikiGrok/: (no message) (duration: 00m 06s)
  • 00:15 logmsgbot: maxsem Synchronized php-1.25wmf12/extensions/WikiGrok/: (no message) (duration: 00m 06s)
  • 00:15 logmsgbot: maxsem Synchronized php-1.25wmf12/extensions/MobileFrontend/: (no message) (duration: 00m 05s)
  • 00:09 godog: stop profiler-to-carbon on tungsten

December 11

  • 23:44 bd808: restarted logstash on logstash1001; fatalmonitor report was empty since ~20:30z
  • 23:35 logmsgbot: bd808 Synchronized wmf-config: Revert Configure logging to use MWLoggerMonologSpi (Ib8ddd86) (duration: 00m 05s)
  • 23:33 logmsgbot: bd808 Synchronized wmf-config: quick revert -- Configure logging to use MWLoggerMonologSpi (I99a032f) (duration: 00m 07s)
  • 23:30 logmsgbot: bd808 Synchronized wmf-config: Configure logging to use MWLoggerMonologSpi (I99a032f) (duration: 00m 09s)
  • 23:05 logmsgbot: maxsem Finished scap: i18n update for CentralNotice (duration: 29m 09s)
  • 22:56 cscott: updated Parsoid to version d16dd2db
  • 22:50 cscott: updated OCG to version bfc3812ef346c9f767135b339cedd123a1bcac98
  • 22:45 hashar: Disconnected/reconnected the Jenkins Gearman client which unstuck Zuul magically.
  • 22:42 hashar: Zuul stuck
  • 22:36 logmsgbot: maxsem Started scap: i18n update for CentralNotice
  • 21:47 hashar: Jenkins re adding integration-slave1009 to the pool of slaves
  • 21:09 logmsgbot: LocalisationUpdate completed (1.25wmf12) at 2014-12-11 21:09:26+00:00
  • 21:09 logmsgbot: awight Synchronized php-1.25wmf12/cache/l10n: (no message) (duration: 00m 01s)
  • 20:58 logmsgbot: LocalisationUpdate completed (1.25wmf11) at 2014-12-11 20:58:24+00:00
  • 20:58 logmsgbot: awight Synchronized php-1.25wmf11/cache/l10n: (no message) (duration: 00m 03s)
  • 20:13 logmsgbot: awight Synchronized php-1.25wmf12/extensions/CentralNotice: IE fix for CentralNotice hide cookies (duration: 00m 06s)
  • 20:13 logmsgbot: awight Synchronized php-1.25wmf11/extensions/CentralNotice: IE fix for CentralNotice hide cookies (duration: 00m 07s)
  • 19:30 chrisjohnson: powering down tmh1002 to replace failed disk
  • 19:21 legoktm: rescuing revisions on frwiki (https://phabricator.wikimedia.org/T76979)
  • 18:41 godog: restart profiler-to-carbon on tungsten to pick up changes, including hhvm-profiler-to-carbon
  • 18:20 logmsgbot: ori Synchronized php-1.25wmf12/resources/src/mediawiki.action/mediawiki.action.edit.stash.js: Ib2de3f15: Stash edit when user idles (duration: 00m 05s)
  • 18:16 ejegg: updated dash from 08b078acf904d563030ff7a37b2af8df88387e29 to 6631a97e5e3e688bc0f4d2a1f6f5d97744dba0f4
  • 17:41 ottomata: starting trusty upgrade of analytics1019
  • 17:18 paravoid: restarting apache on strontium
  • 17:06 _joe_: restarting HHVM on mw1237, stuck in HPHP::StatCache::refresh
  • 16:59 godog: restarted gmond on ms-fe1001, all swift machines under this aggregator were showing offline
  • 16:50 logmsgbot: marktraceur Synchronized wmf-config/: [SWAT] [config] Actually Revert 'Configure logging to use MWLoggerMonologSpi' (duration: 00m 10s)
  • 16:44 logmsgbot: marktraceur Synchronized wmf-config/: [SWAT] [config] Revert Configure logging to use MWLoggerMonologSpi (duration: 00m 05s)
  • 16:44 ottomata: starting trusty upgrade of analytics1011
  • 16:43 logmsgbot: marktraceur Synchronized wmf-config/: [SWAT] [config] Configure logging to use MWLoggerMonologSpi (duration: 00m 07s)
  • 16:28 logmsgbot: marktraceur Synchronized private/PrivateSettings.php: [SWAT] [config] Add password for logstash (duration: 00m 10s)
  • 16:25 logmsgbot: marktraceur Synchronized php-1.25wmf12/extensions/WikimediaEvents/WikimediaEvents.php: [SWAT] [wmf12] Bump sendBeacon schema revision so new URL will be generated (duration: 00m 16s)
  • 16:23 logmsgbot: marktraceur Synchronized php-1.25wmf11/extensions/WikimediaEvents/WikimediaEvents.php: [SWAT] [wmf11] Bump sendBeacon schema revision so new URL will be generated (duration: 00m 14s)
  • 16:12 logmsgbot: marktraceur Synchronized wmf-config/InitialiseSettings.php: [SWAT] [config] Redisable WikiGrok on enwiki (duration: 00m 05s)
  • 16:07 logmsgbot: marktraceur Synchronized wmf-config/InitialiseSettings.php: [SWAT] [config] Reenable WikiGrok on enwiki (duration: 00m 07s)
  • 15:59 ottomata: starting trusty upgrade of analytics1033
  • 15:04 hashar: @damons we love you!
  • 15:01 hashar: saved Jenkins configuration via the web interface to reset the interface language from Chinese to English
  • 13:31 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: pool db1004 in s7, warm up (duration: 00m 06s)
  • 06:02 ori: restarted apache on palladium and strontium
  • 04:08 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Dec 11 04:08:50 UTC 2014 (duration 8m 49s)
  • 04:08 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Dec 11 04:08:33 UTC 2014 (duration 30m 22s)
  • 03:34 logmsgbot: ori Synchronized php-1.25wmf11/extensions/Math: Ic438b307a3b46: Fix for fatal caused by static call to MathRenderer::getError (duration: 00m 06s)
  • 02:57 Krinkle: git-deploy: Deploying integration/mediawiki-tools-codesniffer I602cb6cfe910fc0a
  • 02:45 springle: xtrabackup clone db1007 to db1004
  • 02:12 logmsgbot: LocalisationUpdate completed (1.25wmf12) at 2014-12-11 02:12:04+00:00
  • 02:12 logmsgbot: l10nupdate Synchronized php-1.25wmf12/cache/l10n: (no message) (duration: 00m 01s)
  • 02:10 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1015, warm up (duration: 00m 08s)
  • 02:09 logmsgbot: LocalisationUpdate completed (1.25wmf12) at 2014-12-11 02:09:43+00:00
  • 02:09 logmsgbot: yurik Synchronized php-1.25wmf12/cache/l10n: (no message) (duration: 00m 01s)
  • 02:08 logmsgbot: LocalisationUpdate completed (1.25wmf11) at 2014-12-11 02:08:34+00:00
  • 02:08 logmsgbot: l10nupdate Synchronized php-1.25wmf11/cache/l10n: (no message) (duration: 00m 01s)
  • 01:57 springle: upgrade db1015 trusty
  • 01:56 logmsgbot: LocalisationUpdate completed (1.25wmf11) at 2014-12-11 01:56:51+00:00
  • 01:56 logmsgbot: yurik Synchronized php-1.25wmf11/cache/l10n: (no message) (duration: 00m 01s)
  • 01:41 logmsgbot: maxsem Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/179036 (duration: 00m 06s)
  • 01:38 csteipp: redeploy core fixes for wmf12
  • 01:34 logmsgbot: maxsem Finished scap: Noop, regenerating l18n cache for ZeroBanner (duration: 33m 57s)
  • 01:02 awight: update crm from 3d657972029ea221b321470102c99ad74027b6f7 to 28b68e23b670fe52a401659bde800b64d05e25bf
  • 01:00 logmsgbot: maxsem Started scap: Noop, regenerating l18n cache for ZeroBanner
  • 00:46 logmsgbot: maxsem Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/179028/ (duration: 00m 05s)
  • 00:41 logmsgbot: maxsem Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/178990/ (duration: 00m 05s)
  • 00:38 logmsgbot: maxsem Synchronized php-1.25wmf11/extensions/WikimediaEvents/: (no message) (duration: 00m 06s)
  • 00:35 logmsgbot: maxsem Synchronized php-1.25wmf11/resources/Resources.php: https://gerrit.wikimedia.org/r/#/c/179014/ (duration: 00m 06s)
  • 00:34 logmsgbot: maxsem Synchronized php-1.25wmf11/extensions/Flow/: https://gerrit.wikimedia.org/r/#q,179018,n,z (duration: 00m 07s)
  • 00:33 logmsgbot: maxsem Synchronized php-1.25wmf11/extensions/Flow/: https://gerrit.wikimedia.org/r/#q,179020,n,z (duration: 00m 07s)
  • 00:31 logmsgbot: maxsem Synchronized php-1.25wmf12/extensions/WikimediaEvents/: https://gerrit.wikimedia.org/r/#q,179018,n,z (duration: 00m 05s)
  • 00:15 logmsgbot: maxsem Synchronized php-1.25wmf11/extensions/MobileFrontend/: (no message) (duration: 00m 06s)
  • 00:13 logmsgbot: maxsem Synchronized php-1.25wmf12/extensions/MobileFrontend/: (no message) (duration: 00m 08s)
  • 00:00 logmsgbot: yurik Finished scap: ZeroBanner had some i18n changes, plus bits seems to be out of sync for it (duration: 20m 01s)

December 10

  • 23:40 logmsgbot: yurik Started scap: ZeroBanner had some i18n changes, plus bits seems to be out of sync for it
  • 23:19 logmsgbot: yurik Synchronized php-1.25wmf12/extensions/ZeroPortal/: updatidng ZeroPortal to master (duration: 00m 06s)
  • 23:19 logmsgbot: yurik Synchronized php-1.25wmf12/extensions/ZeroBanner/: updatidng ZeroBanner to master (duration: 00m 05s)
  • 23:14 logmsgbot: yurik Synchronized php-1.25wmf11/extensions/ZeroBanner/: updatidng ZeroBanner to master (duration: 00m 06s)
  • 22:38 logmsgbot: bd808 Synchronized wmf-config/logging-labs.php: Beta monolog config (I76d9953) (duration: 00m 05s)
  • 21:20 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.25wmf12
  • 21:16 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias to 1.25wmf11
  • 21:15 Reedy: manually ran scap-rebuild-cdbs on mw1176
  • 21:10 logmsgbot: reedy Finished scap: testwiki to 1.25wmf12 (duration: 43m 21s)
  • 21:07 Reedy: got 21:05:27 sudo -u mwdeploy -n -- /srv/deployment/scap/scap/bin/scap-rebuild-cdbs on mw1176 returned [255]: Error reading response length from authentication socket. Permission denied (publickey). from mw1176
  • 20:27 logmsgbot: reedy Started scap: testwiki to 1.25wmf12
  • 18:32 cmjohnson: replacing disk slot 4 db1015
  • 18:26 cmjohnson: replacing disk 0 db1010
  • 16:53 godog: reinstalling graphite1001 as graphite1002
  • 16:28 Coren: authdns-update to merge in https://gerrit.wikimedia.org/r/178860
  • 16:23 godog: swapping sdm on ms-be2013 / ms-be2014 / ms-be2015
  • 15:28 ottomata: initiated replica election since analytics1021 timed out zk connection again (I had hoped we were done with this :( )
  • 15:05 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1037, warm up (duration: 00m 09s)
  • 14:00 paravoid: running dpkg --remove-architecture i386 (trusty); rm /etc/dpkg/dpkg.cfg.d/multiarch (precise) across the whole fleet with the exception of gallium/lanthanum
  • 11:46 _joe_: cleaning and vacuuming the HHVM cache on a few hosts
  • 09:00 _joe_: cleaning and vacuuming the hhvm repo on mw1030
  • 08:38 logmsgbot: ori Synchronized php-1.25wmf11/extensions/CommonsMetadata: (no message) (duration: 00m 07s)
  • 08:10 logmsgbot: ori Synchronized php-1.25wmf11/extensions/CommonsMetadata/TemplateParser.php: Update CommonsMetadata for cherry-picks (duration: 00m 05s)
  • 07:49 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1007 (duration: 00m 05s)
  • 07:18 springle: upgrade db1007 trusty
  • 05:19 springle: s6 xtrabackup clone db1015 to db1037
  • 05:09 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1037 (duration: 00m 06s)
  • 04:06 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Dec 10 04:06:39 UTC 2014 (duration 6m 38s)
  • 02:26 logmsgbot: LocalisationUpdate completed (1.25wmf11) at 2014-12-10 02:26:38+00:00
  • 02:26 logmsgbot: l10nupdate Synchronized php-1.25wmf11/cache/l10n: (no message) (duration: 00m 01s)
  • 02:19 logmsgbot: LocalisationUpdate completed (1.25wmf10) at 2014-12-10 02:19:10+00:00
  • 02:19 logmsgbot: l10nupdate Synchronized php-1.25wmf10/cache/l10n: (no message) (duration: 00m 03s)
  • 01:59 ori: manually ran /etc/cron.daily/logrotate on fluorine
  • 01:58 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1015 RT 9027 (duration: 00m 06s)
  • 01:46 awight: update crm from d22dce0a375be3c5f32afc472fff550a5edf6a1e to 3d657972029ea221b321470102c99ad74027b6f7
  • 01:31 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1039, warm up (duration: 00m 05s)
  • 01:12 logmsgbot: ori Synchronized php-1.25wmf11/extensions/WikimediaEvents: Ie9ca5d3: Update WikimediaEvents for cherry-picks (duration: 00m 07s)
  • 00:58 awight: update crm from 94997a37a6531f2f1d5074895d5fa2da947e03f0 to d22dce0a375be3c5f32afc472fff550a5edf6a1e
  • 00:54 logmsgbot: catrope Synchronized php-1.25wmf11/extensions/VisualEditor: SWAT (duration: 00m 06s)
  • 00:54 logmsgbot: catrope Synchronized php-1.25wmf11/extensions/WikimediaEvents: SWAT (duration: 00m 05s)
  • 00:54 springle: upgrade db1039 trusty
  • 00:47 logmsgbot: ori Synchronized php-1.25wmf10/extensions/Math: Ic438b307a3b46: Fix for fatal caused by static call to MathRenderer::getError (duration: 00m 07s)
  • 00:42 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: Set $wgWMEStatsdBaseUri for WikimediaEvents / statsv (duration: 00m 07s)
  • 00:34 logmsgbot: catrope Synchronized php-1.25wmf11/extensions/WikimediaEvents: SWAT: sendBeacon experiment (duration: 00m 05s)
  • 00:34 logmsgbot: catrope Synchronized php-1.25wmf10/extensions/WikimediaEvents: SWAT: sendBeacon experiment (duration: 00m 06s)
  • 00:18 K4-713: updated payments to 21afbcf24c0e2124f78
  • 00:17 logmsgbot: catrope Synchronized php-1.25wmf11/includes/api/ApiOpenSearch.php: SwAT: fix empty LinkBatch in opensearch (duration: 00m 05s)
  • 00:16 logmsgbot: catrope Synchronized wmf-config/InitialiseSettings.php: Subpages for Archive talk on officewiki (duration: 00m 06s)

December 9

  • 23:56 logmsgbot: ebernhardson Synchronized wmf-config/InitialiseSettings.php: Flow enable the non NS_* talk namespaces (duration: 00m 07s)
  • 23:38 logmsgbot: ebernhardson Synchronized wmf-config/: Disable LQT on officewiki (duration: 00m 05s)
  • 23:33 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1039 (duration: 00m 06s)
  • 23:04 logmsgbot: ebernhardson Synchronized wmf-config/InitialiseSettings.php: Flow enable the rest of officewiki talk namespaces (duration: 00m 09s)
  • 22:57 logmsgbot: ori Synchronized php-1.25wmf11/includes/api/ApiStashEdit.php: (no message) (duration: 00m 05s)
  • 22:55 logmsgbot: ebernhardson Synchronized wmf-config/InitialiseSettings.php: Flow enable category talk namespace on officewiki (duration: 00m 08s)
  • 22:53 logmsgbot: ebernhardson Synchronized php-1.25wmf11/extensions/Flow: Bump flow in 1.25wmf11 for officewiki import fixes (duration: 00m 07s)
  • 22:49 logmsgbot: ebernhardson Synchronized wmf-config/InitialiseSettings.php: Flow enable 4 pages on cawiki (duration: 00m 05s)
  • 22:34 logmsgbot: aaron Synchronized php-1.25wmf11/includes/page/WikiPage.php: dff1662755d828675e5ae119b1987ace10865693 (duration: 00m 06s)
  • 22:33 logmsgbot: aaron Synchronized php-1.25wmf11/includes/api/ApiStashEdit.php: dff1662755d828675e5ae119b1987ace10865693 (duration: 00m 06s)
  • 22:27 logmsgbot: ebernhardson Synchronized wmf-config/InitialiseSettings.php: enable flow on three namespaces on officewiki (duration: 00m 06s)
  • 22:24 logmsgbot: ebernhardson Synchronized wmf-config/InitialiseSettings.php: enable flow on three namespaces on officewiki (duration: 00m 05s)
  • 22:20 logmsgbot: ori Synchronized php-1.25wmf11/includes/page/WikiPage.php: undo: (hack) $useCache = true (duration: 00m 07s)
  • 22:18 logmsgbot: ori Synchronized php-1.25wmf11/includes/page/WikiPage.php: (hack) $useCache = true (duration: 00m 06s)
  • 22:17 logmsgbot: ebernhardson Synchronized php-1.25wmf11/extensions/Flow/: Push flow updates for officewiki deploy (duration: 00m 08s)
  • 22:11 logmsgbot: ebernhardson Synchronized wmf-config/InitialiseSettings.php: enable flow on cawiki (duration: 00m 06s)
  • 22:01 logmsgbot: ori Synchronized php-1.25wmf11/includes/api/ApiStashEdit.php: I5c296325: Various edit stash fixes (duration: 00m 06s)
  • 20:53 legoktm: ran update revision set rev_page="8555535" where rev_page="6628330"; on frwiki
  • 20:48 legoktm: ran update revision set rev_page="8555529" where rev_page="1469156"; on frwiki (for T76979)
  • 20:46 YuviPanda: started /usr/local/bin/dumpwikidatajson.sh on snapshot1003 per hoo, after killing php processes from earlier start as well as from the earlier botched kill
  • 20:44 ottomata: renaming all webrequest varnishkafka instances
  • 20:37 YuviPanda: started /usr/local/bin/dumpwikidatajson.sh on snapshot1003 per hoo, to re-start dump script aborted earlier
  • 20:33 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 06s)
  • 20:28 logmsgbot: reedy Synchronized multiversion/: cdb bump (duration: 00m 05s)
  • 20:25 Reedy: ran sync-common on mw1203
  • 20:24 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: non wikipedias to 1.25wmf11
  • 20:20 _joe_: repooled the last api servers
  • 20:18 Coren: gave a+r to /etc/ssh/ssh_known_hosts on tin and iron
  • 20:15 Reedy: mw1203 seems to be down
  • 20:14 logmsgbot: reedy Synchronized wmf-config/CommonSettings.php: touch (duration: 01m 07s)
  • 20:12 logmsgbot: reedy Synchronized wmf-config/CommonSettings.php: touch (duration: 01m 03s)
  • 20:00 awight: update crm from 77e99a530b7c3910ca521923d97830df08a4d1b1 to 94997a37a6531f2f1d5074895d5fa2da947e03f0
  • 19:25 Reedy: that is a lie, 266 hosts failed
  • 19:24 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: non wikipedias to 1.25wmf11
  • 18:21 _joe_: api 100% on HHVM now
  • 18:21 _joe_: depooling mw1207-8, repooling mw1203-06
  • 17:31 csteipp: patched for T77624
  • 17:27 logmsgbot: csteipp Synchronized php-1.25wmf11/extensions/Listings/Listings.body.php: (no message) (duration: 00m 07s)
  • 17:07 godog: powercycle ms-be1012, no console
  • 16:42 _joe_: depooling mw1201-04
  • 16:41 _joe_: repooling mw1194-1200
  • 16:34 logmsgbot: anomie Synchronized wmf-config/CommonSettings-labs.php: Deploy some Labs-only changes so they're not showing as undeployed (duration: 00m 05s)
  • 16:31 logmsgbot: anomie Synchronized php-1.25wmf11/extensions/Wikidata/: SWAT: Fix issue with json dump and sites caching in Wikidata gerrit:178533 (duration: 00m 15s)
  • 16:29 logmsgbot: anomie Synchronized php-1.25wmf10/includes/filerepo/file/File.php: SWAT: Fix for broken thumbnails when the file width is in $wgThumbnailBucket gerrit:178529 (duration: 01m 04s)
  • 16:18 logmsgbot: anomie Synchronized php-1.25wmf11/includes/filerepo/file/File.php: SWAT: Fix for broken thumbnails when the file width is in $wgThumbnailBucket gerrit:178531 (duration: 00m 08s)
  • 15:36 qchris: restarted EventLogging's m2 writer on vanadium. Events did not get written into the database.
  • 15:22 _joe_: repooling mw1190-93, depooling mw1194-1200
  • 14:59 _joe_: repooled mw1147-48
  • 14:42 YuviPanda: killed wikidata dump process (/usr/local/bin/dumpwikidatajson.sh) per hoo
  • 13:59 _joe_: repooling mw1140-46, depooling mw114[78], mw119[0-3]
  • 13:00 _joe_: repool mw1133-39, depooling mw1140-46
  • 12:01 _joe_: repooling mw1125-1132, depooling mw1133-39
  • 11:30 godog: restarting diamond on trusty hosts via salt
  • 11:19 Reedy: [10:43:54] <_joe_> !log repooling mw1120-25, depooling mw1126-32
  • 11:15 springle: repool db1010, warm up
  • 11:15 springle: kicked morebots
  • 08:13 _joe_: depooling mw1115-1119 from the api pool, reimaging
  • 08:06 _joe_: restarting diamond on all appservers
  • 06:01 logmsgbot: ori Synchronized php-1.25wmf11/extensions/Math/MathInputCheckTexvc.php: Fix for fatal caused by static call to MathRenderer::getError (duration: 00m 06s)
  • 06:01 logmsgbot: ori Synchronized php-1.25wmf10/extensions/Math/MathInputCheckTexvc.php: Fix for fatal caused by static call to MathRenderer::getError (duration: 00m 06s)
  • 04:27 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Dec 9 04:27:31 UTC 2014 (duration 27m 30s)
  • 04:18 springle: upgrade db1010 trusty
  • 03:33 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1010 (duration: 00m 08s)
  • 02:23 logmsgbot: LocalisationUpdate completed (1.25wmf11) at 2014-12-09 02:23:35+00:00
  • 02:23 logmsgbot: l10nupdate Synchronized php-1.25wmf11/cache/l10n: (no message) (duration: 00m 01s)
  • 02:17 logmsgbot: LocalisationUpdate completed (1.25wmf10) at 2014-12-09 02:17:11+00:00
  • 02:17 logmsgbot: l10nupdate Synchronized php-1.25wmf10/cache/l10n: (no message) (duration: 00m 03s)
  • 01:49 logmsgbot: ori Synchronized wmf-config/InitialiseSettings.php: Set $wgAjaxEditStash to false while API cluster is on Zend (duration: 00m 06s)
  • 01:24 K4-713: updated payments to 0e92713c0d6e5
  • 00:13 logmsgbot: maxsem Synchronized php-1.25wmf10/extensions/Wikidata/: https://gerrit.wikimedia.org/r/#q,178374,n,z (duration: 00m 13s)
  • 00:12 logmsgbot: maxsem Synchronized php-1.25wmf10/extensions/VisualEditor/: https://gerrit.wikimedia.org/r/#q,178371,n,z (duration: 00m 07s)
  • 00:09 logmsgbot: maxsem Synchronized php-1.25wmf11/extensions/Wikidata/: https://gerrit.wikimedia.org/r/#q,178375,n,z (duration: 00m 13s)
  • 00:09 logmsgbot: maxsem Synchronized php-1.25wmf11/extensions/VisualEditor/: https://gerrit.wikimedia.org/r/#q,178372,n,z (duration: 00m 07s)
  • 00:08 logmsgbot: maxsem Synchronized php-1.25wmf11/extensions/MobileFrontend/: https://gerrit.wikimedia.org/r/#q,177942,n,z (duration: 00m 08s)
  • 00:02 logmsgbot: maxsem Synchronized wmf-config/: https://gerrit.wikimedia.org/r/178240 (duration: 00m 05s)

December 8

  • 23:15 K4-713: revlocked payments wiki to 30f15865bc4efe3b2b
  • 22:09 awight: update crm from c9b733f0963a04ab1174ede0d5641e9b884747c8 to 77e99a530b7c3910ca521923d97830df08a4d1b1
  • 21:52 awight: updated tools from 06e69f0bd1a1f74eb8055f5300b48ad3b78eedea to 88b57fea517d2232e8ae906df550f426b6574f24
  • 20:20 awight: updated crm adfbbecbf949932932a3b6bc8c20c15e2a8054b2 to c9b733f0963a04ab1174ede0d5641e9b884747c8
  • 19:40 logmsgbot: krinkle Synchronized php-1.25wmf11/resources/src/startup.js: touch for T47877 (duration: 00m 06s)
  • 19:19 csteipp: deployed patches for T77028 and T76686
  • 19:13 logmsgbot: ori Finished scap: I5a7e258d2: Optimize how user options are delivered to the client (duration: 26m 45s)
  • 18:46 logmsgbot: ori Started scap: I5a7e258d2: Optimize how user options are delivered to the client
  • 18:04 YuviPanda: removed restbase/ from graphite for T77172 on tungsten
  • 17:54 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: Unset \$wgTidyInternal (duration: 00m 07s)
  • 17:52 manybubbles: rebuilding eswiki's cirrus index to pick up fix for slow prefix searches
  • 17:02 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: enable tidy extension on mw1081 (duration: 00m 06s)
  • 16:56 ottomata: doing controlled restart of kafka broker analytics1021, and then initiating replica election to bring it back into leadership
  • 16:21 hashar: Jenkins: disconnected / reconnected gallium slave from the web interface. It was locked not being able to run the mediawiki/vagrant postmerge doc job
  • 16:08 logmsgbot: demon Synchronized php-1.25wmf11/extensions/VisualEditor: (no message) (duration: 00m 06s)
  • 16:08 logmsgbot: demon Synchronized php-1.25wmf10/extensions/VisualEditor: (no message) (duration: 00m 09s)
  • 15:03 hashar: Broke zuul-cloner by mistake
  • 14:36 godog: reboot graphite1001 for kernel upgrade
  • 13:11 hashar: Restarting zuul and zuul-merger on gallium
  • 13:10 hashar: Zuul: rebasing our fork to bring some upstream changes
  • 10:41 godog: upload diamond 3.5-2 to trusty-wikimedia
  • 09:18 andrewbogott: the failure looked like this: "Unexpected error in mod_passenger: Could not connect to the ApplicationPool server: Broken pipe (32)"
  • 09:18 andrewbogott: graceful'd apache on virt1000 -- resolving a mysterious puppetmaster outage
  • 05:08 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1049, warm up (duration: 00m 06s)
  • 04:41 springle: upgrade db1049 trusty
  • 04:17 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1049 (duration: 00m 06s)
  • 04:16 springle: also truncated puppet fact_values, see http://projects.puppetlabs.com/issues/9225 and https://tickets.puppetlabs.com/browse/PUP-1173
  • 04:11 springle: puppet fact_values hit auto_inc limit. altered table to restart from 1 to get puppet running (seems safe, but needs checking, maybe also truncate)
  • 03:42 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Dec 8 03:42:28 UTC 2014 (duration 42m 27s)
  • 02:15 logmsgbot: LocalisationUpdate completed (1.25wmf11) at 2014-12-08 02:15:53+00:00
  • 02:15 logmsgbot: l10nupdate Synchronized php-1.25wmf11/cache/l10n: (no message) (duration: 00m 01s)
  • 02:10 logmsgbot: LocalisationUpdate completed (1.25wmf10) at 2014-12-08 02:10:20+00:00
  • 02:10 logmsgbot: l10nupdate Synchronized php-1.25wmf10/cache/l10n: (no message) (duration: 00m 03s)

December 7

  • 03:41 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Dec 7 03:41:52 UTC 2014 (duration 41m 51s)
  • 02:16 logmsgbot: LocalisationUpdate completed (1.25wmf11) at 2014-12-07 02:16:30+00:00
  • 02:16 logmsgbot: l10nupdate Synchronized php-1.25wmf11/cache/l10n: (no message) (duration: 00m 02s)
  • 02:11 logmsgbot: LocalisationUpdate completed (1.25wmf10) at 2014-12-07 02:11:07+00:00
  • 02:11 logmsgbot: l10nupdate Synchronized php-1.25wmf10/cache/l10n: (no message) (duration: 00m 01s)

December 6

  • 22:51 ori: restarted apache on palladium
  • 19:48 Krinkle: Made trivial edit to Jenkins language config to purge the French invasion (default language: en-us -> en-US)
  • 19:39 Krinkle: Jenkins has been conquered by the French again
  • 03:42 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Dec 6 03:42:37 UTC 2014 (duration 42m 36s)
  • 02:18 logmsgbot: LocalisationUpdate completed (1.25wmf11) at 2014-12-06 02:18:52+00:00
  • 02:18 logmsgbot: l10nupdate Synchronized php-1.25wmf11/cache/l10n: (no message) (duration: 00m 01s)
  • 02:13 logmsgbot: LocalisationUpdate completed (1.25wmf10) at 2014-12-06 02:13:12+00:00
  • 02:13 logmsgbot: l10nupdate Synchronized php-1.25wmf10/cache/l10n: (no message) (duration: 00m 01s)
  • 01:00 logmsgbot: demon Synchronized w/robots.php: better mtime (duration: 00m 06s)
  • 00:49 logmsgbot: ori Synchronized wmf-config/InitialiseSettings.php: Add preview log group (duration: 00m 06s)

December 5

  • 18:22 YuviPanda: restart gitblit on antimony
  • 18:03 akosiaris: uploaded apertium-eo-en, apertium-id-ms to apt.wikimedia.org
  • 16:44 ottomata: rebooting analytics1021
  • 15:51 mutante: hack-fixed http://noc.wikimedia.org/db.php
  • 15:46 _joe_: repooling all the servers
  • 15:06 _joe_: depooling mw1209-1220, enabling hyperthreading and upgrading
  • 14:11 paravoid: rebooting copper
  • 13:01 _joe_: repooling all appservers
  • 12:32 Krinkle: Reloading Zuul to deploy I9515542a1ac2ff
  • 11:48 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1059, warm up (duration: 00m 07s)
  • 11:46 _joe_: repooled mw1161-mw1170, depooling mw1171-80
  • 10:58 springle: upgrade db1059 trusty
  • 10:56 _joe_: depooling mw1161-mw1170 for enabling hyperthreading
  • 10:42 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1059 (duration: 00m 06s)
  • 10:24 _joe_: repooling the appservers
  • 09:21 _joe_: depooling some appservers for maintenance/upgrades
  • 06:56 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1027, warm up (duration: 00m 05s)
  • 05:02 springle: upgrade db1027 trusty
  • 04:43 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1027 (duration: 00m 07s)
  • 04:14 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Dec 5 04:14:34 UTC 2014 (duration 14m 33s)
  • 03:30 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1060, warm up (duration: 00m 06s)
  • 02:55 springle: upgrade db1060 trusty
  • 02:35 springle: manual sync-common on mw1203 (after apparently transient sync-file network error on tin)
  • 02:26 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1060 (duration: 00m 08s)
  • 02:20 logmsgbot: LocalisationUpdate completed (1.25wmf11) at 2014-12-05 02:20:57+00:00
  • 02:20 logmsgbot: l10nupdate Synchronized php-1.25wmf11/cache/l10n: (no message) (duration: 00m 01s)
  • 02:16 logmsgbot: LocalisationUpdate completed (1.25wmf10) at 2014-12-05 02:16:49+00:00
  • 02:16 logmsgbot: l10nupdate Synchronized php-1.25wmf10/cache/l10n: (no message) (duration: 00m 01s)
  • 02:01 logmsgbot: awight Synchronized php-1.25wmf11/extensions/CentralNotice: rollback Googlebot cloaking (duration: 00m 08s)
  • 02:01 logmsgbot: awight Synchronized php-1.25wmf10/extensions/CentralNotice: rollback Googlebot cloaking (duration: 00m 05s)
  • 01:31 logmsgbot: awight Synchronized php-1.25wmf10/extensions/CentralNotice: rollback CentralNotice 'improvement' (duration: 00m 05s)
  • 01:31 logmsgbot: awight Synchronized php-1.25wmf11/extensions/CentralNotice: rollback CentralNotice 'improvement' (duration: 00m 09s)
  • 00:42 logmsgbot: maxsem Synchronized search-redirect.php: Second attempt (duration: 00m 05s)
  • 00:42 logmsgbot: maxsem Synchronized search-redirect.php: https://gerrit.wikimedia.org/r/#/c/177665/ (duration: 00m 06s)
  • 00:35 logmsgbot: maxsem Synchronized wmf-config/: https://gerrit.wikimedia.org/r/#/c/177708/ (duration: 00m 07s)
  • 00:32 logmsgbot: maxsem Synchronized wmf-config/: https://gerrit.wikimedia.org/r/#/c/177600/ (duration: 00m 06s)
  • 00:28 logmsgbot: maxsem Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/173083/ (duration: 00m 05s)
  • 00:22 logmsgbot: maxsem Synchronized wmf-config/: https://gerrit.wikimedia.org/r/#/c/177660/ (duration: 00m 06s)
  • 00:17 logmsgbot: maxsem Synchronized wmf-config/: https://gerrit.wikimedia.org/r/#/c/177494/ (duration: 00m 07s)
  • 00:09 logmsgbot: maxsem Synchronized php-1.25wmf10/extensions/Wikidata/: https://gerrit.wikimedia.org/r/#q,177693,n,z (duration: 00m 11s)
  • 00:09 logmsgbot: maxsem Synchronized php-1.25wmf11/extensions/Wikidata/: https://gerrit.wikimedia.org/r/#q,177693,n,z (duration: 00m 14s)
  • 00:04 logmsgbot: maxsem Synchronized php-1.25wmf11/extensions/VisualEditor/: https://gerrit.wikimedia.org/r/#q,177643,n,z (duration: 00m 07s)

December 4

  • 22:36 logmsgbot: awight Synchronized php-1.25wmf11/extensions/CentralNotice: push CentralNotice features (duration: 00m 06s)
  • 22:35 logmsgbot: awight Synchronized php-1.25wmf10/extensions/CentralNotice: push CentralNotice features (duration: 00m 08s)
  • 22:01 YuviPanda: nodejs sucks.
  • 21:36 logmsgbot: awight Synchronized wmf-config: Shortening CentralNotice close box to 1 week (duration: 00m 07s)
  • 21:16 awight: pushed CentralNotice fixes to hide from Google
  • 21:16 logmsgbot: awight Synchronized php-1.25wmf11/extensions/CentralNotice: Hide CentralNotice banners from Google (duration: 00m 07s)
  • 21:16 logmsgbot: awight Synchronized php-1.25wmf10/extensions/CentralNotice: Hide CentralNotice banners from Google (duration: 00m 06s)
  • 21:08 ori: restarted ganglia-monitor on mw1081
  • 20:56 legoktm: removeInvalidEmails.php finished, removed a total of 218,598 emails
  • 20:17 logmsgbot: awight Synchronized robots.txt: Disallow banner stuff in robots.txt (duration: 00m 07s)
  • 19:16 legoktm: running removeInvalidEmails.php across all wikis
  • 18:57 logmsgbot: demon Synchronized wmf-config/StartProfiler.php: (no message) (duration: 00m 05s)
  • 18:33 legoktm: ran removeInvalidEmails.php on testwiki
  • 18:28 bd808: Rolling restart of the elasticsearch cluster for logstash did not fix corrupted logstash-2014.11.30 index. It was worth a shot.
  • 18:23 bd808: restarted elasticsearch on logstash1002
  • 17:28 bd808: restarted elasticsearch on logstash1003 to see if the missing indices would recover
  • 16:28 bd808: restarted elasticsearch on logstash1001.
  • 14:40 godog: upgrade to diamond 3.5 on trusty hosts in esams
  • 14:31 godog: upgrade to diamond 3.5 on trusty hosts in esams
  • 14:14 hashar: Jenkins: haven't had to restart it, I cancelled a few jobs and it went back up processing jobs..
  • 14:12 hashar: Jenkins in deadlock , restarting it ( https://phabricator.wikimedia.org/T72597 )
  • 13:45 _joe_: repooling mw1081
  • 13:10 _joe_: depooling mw1081 to activate hyperthreading
  • 12:09 paravoid: powercycling rhenium, kernel locked up
  • 11:24 godog: upgrade to diamond 3.5 on trusty hosts in ulsfo
  • 10:44 godog: upgrade diamond to 3.5 on all trusty machines in codfw
  • 10:33 godog: test-upgrade diamond 3.5 on swift in codfw
  • 10:01 ori: disabling puppet on mw1081 and restarting hhvm with hhvm.server.stat_cache =true to observe impact
  • 09:47 godog: upload diamond 3.5-1wmf1 to trusty-wikimedia
  • 08:12 akosiaris: disable puppet on carbon. Playing with partman :)
  • 05:13 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: confirmedit disabled on closed wikis (duration: 00m 05s)
  • 04:23 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Dec 4 04:23:26 UTC 2014 (duration 23m 25s)
  • 04:01 Tim: on mw1189: restarting hhvm
  • 02:55 logmsgbot: krinkle Synchronized w/: Ifbfb7dfd8fc0cd822b0 and I6594bc82b9de (duration: 00m 05s)
  • 02:32 logmsgbot: LocalisationUpdate completed (1.25wmf11) at 2014-12-04 02:32:18+00:00
  • 02:32 logmsgbot: l10nupdate Synchronized php-1.25wmf11/cache/l10n: (no message) (duration: 00m 01s)
  • 02:20 logmsgbot: LocalisationUpdate completed (1.25wmf10) at 2014-12-04 02:20:47+00:00
  • 02:20 logmsgbot: l10nupdate Synchronized php-1.25wmf10/cache/l10n: (no message) (duration: 00m 01s)
  • 02:09 springle: labsdb1001 upgrade & reboot
  • 02:05 awight: update crm from cd936bb433e9f107d860fb6e3da44c2ca2cb7742 to adfbbecbf949932932a3b6bc8c20c15e2a8054b2
  • 01:56 springle: labsdb1002 upgrade & reboot
  • 01:26 springle: labsdb1003 upgrade & reboot
  • 00:40 logmsgbot: maxsem Synchronized php-1.25wmf11/extensions/Wikidata/: (no message) (duration: 00m 13s)
  • 00:39 logmsgbot: maxsem Synchronized php-1.25wmf10/extensions/Wikidata/: (no message) (duration: 00m 12s)
  • 00:25 logmsgbot: maxsem Synchronized php-1.25wmf11/maintenance/removeInvalidEmails.php: https://gerrit.wikimedia.org/r/#/c/177021/ (duration: 00m 05s)
  • 00:25 logmsgbot: maxsem Synchronized php-1.25wmf10/maintenance/removeInvalidEmails.php: https://gerrit.wikimedia.org/r/#/c/177021/ (duration: 00m 05s)
  • 00:21 logmsgbot: maxsem Synchronized wmf-config/: https://gerrit.wikimedia.org/r/#/c/177271/ (duration: 00m 07s)
  • 00:11 logmsgbot: maxsem Synchronized php-1.25wmf11/extensions/WikiGrok/: https://gerrit.wikimedia.org/r/177393 (duration: 00m 05s)
  • 00:10 logmsgbot: maxsem Synchronized php-1.25wmf10/extensions/WikiGrok/: https://gerrit.wikimedia.org/r/177393 (duration: 00m 08s)
  • 00:04 logmsgbot: maxsem Synchronized wmf-config/: https://gerrit.wikimedia.org/r/#/c/177153/ , noop in production (duration: 00m 07s)

December 3

  • 23:35 logmsgbot: hoo Synchronized wmf-config/Wikibase.php: Don't lookup Sites from mc for the 'languageLinkSiteGroup' setting (duration: 00m 06s)
  • 23:19 logmsgbot: ebernhardson Synchronized php-1.25wmf10/extensions/Flow/includes/Parsoid/: (no message) (duration: 00m 05s)
  • 23:14 K4-713: updated localsettings on payments
  • 22:49 K4-713: localsettings change for payments-wiki-staging
  • 22:25 YuviPanda: repooled mw1177
  • 22:22 cscott: updated Parsoid to version 733986a6
  • 22:11 logmsgbot: ebernhardson Synchronized wmf-config/: Flow enable NS_PROJECT_TALK on officewiki (duration: 00m 07s)
  • 22:04 logmsgbot: demon Synchronized wmf-config/StartProfiler.php: xhprof & such (duration: 00m 05s)
  • 21:49 cscott: updated OCG to version 08e94b19c3f17e699d7e53d9605f65c58e17ea0e
  • 21:25 logmsgbot: reedy Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 05s)
  • 21:02 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 07s)
  • 20:49 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.25wmf11
  • 20:49 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias to 1.25wmf10
  • 20:46 logmsgbot: reedy Synchronized php-1.25wmf11/extensions/Wikidata: (no message) (duration: 00m 12s)
  • 20:46 logmsgbot: reedy Synchronized php-1.25wmf10/extensions/Wikidata: (no message) (duration: 00m 13s)
  • 20:43 logmsgbot: reedy Finished scap: testwiki to 1.25wmf11 (take 2) (duration: 36m 35s)
  • 20:07 logmsgbot: reedy Started scap: testwiki to 1.25wmf11 (take 2)
  • 19:58 logmsgbot: reedy Started scap: testwiki to 1.25wmf11 and rebuild l10n caches
  • 18:58 YuviPanda: manually killed parsoid on wtp1017, restarted with service parsoid restart
  • 18:57 YuviPanda: manually killed parsoid on wtp1009, restarted with service parsoid restart
  • 18:29 subbu: restarted parsoid to clear any cached v2 api state to prevent leakage into v1 api requests
  • 17:06 _joe_: repooling mw1108-1113
  • 16:53 YuviPanda: repooling mw1183 mw1172 mw1170 as hhvm
  • 16:48 ottomata: starting upgrade of analytics1027 to trusty, hive and oozie are offline for a bit
  • 16:41 ottomata: starting trusty upgrade of analytics1027
  • 16:41 logmsgbot: anomie Synchronized w/robots.php: Committed live hack (for real this time) (duration: 00m 05s)
  • 16:37 logmsgbot: anomie Synchronized w/robots.php: Committed live hack (duration: 00m 05s)
  • 16:35 logmsgbot: anomie Synchronized w/robots.php: Remove Content-Length from robots.txt (live hack for test, will commit or revert momentarily) (duration: 00m 07s)
  • 16:27 YuviPanda: repool mw1166 mw1165 mw1164 mw1162 mw1161
  • 16:19 logmsgbot: anomie Synchronized w/robots.php: Fix Content-Length from robots.txt (duration: 00m 06s)
  • 15:28 YuviPanda: depooling mw1161-62 for re-imaging
  • 15:20 akosiaris: rebooting mw1054 for kernel upgrade
  • 15:18 _joe_: repooling mw1101-1107, depooling mw1108-1113
  • 15:14 YuviPanda: depool mw1164-66
  • 15:12 akosiaris: reimaging mw1149-1152
  • 14:47 YuviPanda: repooled mw1173 mw1171 mw1169 mw1168 mw1167
  • 14:28 akosiaris: reimaging mw1054
  • 14:07 _joe_: depooling mw1101-mw1107
  • 13:51 YuviPanda: depooling mw1167 for re-imaging
  • 13:49 YuviPanda: depooling mw1168 for re-imaging
  • 13:13 YuviPanda: depool mw1170-3 for re-imaging
  • 13:10 YuviPanda: depool mw1177 for hhvm re-imaging
  • 11:24 akosiaris: reimaging mw1054-mw1059
  • 10:52 _joe_: repooling mw1088-1094, depooling mw1095-1100
  • 09:39 _joe_: repooling mw1045, depooling 1088-1094
  • 09:31 _joe_: repooled mw1081-1087
  • 09:24 YuviPanda: depooled mw1045 for _joe_
  • 09:24 YuviPanda: repooled mw1174-6,8,9, 80,81
  • 09:01 YuviPanda: repooling mw1182
  • 08:30 YuviPanda: depooling mw1174-mw1176
  • 08:27 YuviPanda: depooling mw1178 for re-imaging
  • 08:27 _joe_: depooling mw1081-1087
  • 08:24 YuviPanda: depooling mw1179 for re-imaging
  • 08:19 YuviPanda: depooling mw1180 for re-imaging
  • 08:14 YuviPanda: depooling mw1181 for re-imaging
  • 08:02 YuviPanda: depooling mw1182 for re-imaging
  • 07:59 YuviPanda: depooling mw1183 for re-imaging
  • 06:41 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: db1072 full load (duration: 00m 06s)
  • 06:31 Krinkle: Reloading Zuul to deploy If499fe06e0392f4046f97f5633c08ba442649ec5
  • 04:22 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Dec 3 04:22:41 UTC 2014 (duration 22m 40s)
  • 03:40 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1072, warm up (duration: 00m 10s)
  • 03:13 springle: upgrade db1072 trusty
  • 02:32 logmsgbot: LocalisationUpdate completed (1.25wmf10) at 2014-12-03 02:32:29+00:00
  • 02:32 logmsgbot: l10nupdate Synchronized php-1.25wmf10/cache/l10n: (no message) (duration: 00m 01s)
  • 02:22 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1072 (duration: 00m 08s)
  • 02:19 logmsgbot: LocalisationUpdate completed (1.25wmf9) at 2014-12-03 02:19:21+00:00
  • 02:19 logmsgbot: l10nupdate Synchronized php-1.25wmf9/cache/l10n: (no message) (duration: 00m 02s)
  • 00:52 logmsgbot: yurik Synchronized php-1.25wmf10/extensions/ZeroPortal: updatidng ZeroPortal to master (duration: 00m 07s)
  • 00:34 logmsgbot: ebernhardson Synchronized wmf-config/: Turning on wgWikiGrokDebug on en BetaLabs (duration: 00m 06s)
  • 00:32 logmsgbot: ebernhardson Synchronized wmf-config/PoolCounterSettings-eqiad.php: Create new pool counter for prefix searches (duration: 00m 05s)
  • 00:31 ori: restarted apache2 on palladium
  • 00:30 logmsgbot: ebernhardson Finished scap: Bumping flow submodule in 1.25wmf10 (duration: 38m 55s)
  • 00:15 ejegg: updated tools from 0a2c365455d417b21f4ebccaf0e5e3fc5bdb887f to 06e69f0bd1a1f74eb8055f5300b48ad3b78eedea

December 2

  • 23:54 ejegg: updated tool from 113dfe160b750657626e07450003cc88d3939fbd to c8f63baf134e57680fd255874455d52efb70596f
  • 23:51 logmsgbot: ebernhardson Started scap: Bumping flow submodule in 1.25wmf10
  • 23:32 ejegg: updated crm from 68703898b7ebfb2a038f307f17788739114806e4 to cd936bb433e9f107d860fb6e3da44c2ca2cb7742
  • 22:45 YuviPanda: repooling mw118[4-7] as HHVM!
  • 22:39 _joe_: likewise on mw1121, mw1200
  • 22:34 _joe_: restarting apache on mw1110 mw1167 mw1175, stuck in apc futex
  • 21:54 YuviPanda: depooling mw1184 for re-imaging
  • 21:51 YuviPanda: depooling mw1185 for re-imaging
  • 21:49 YuviPanda: depooling mw1186 for re-imgaging
  • 21:48 YuviPanda: depooling mw1187 for re-imaging
  • 21:07 YuviPanda: re-pooled mw1188
  • 20:43 mutante: added jdouglas to wmf LDAP group
  • 20:14 YuviPanda: repooled mw1209
  • 20:11 logmsgbot: aude Synchronized php-1.25wmf10/extensions/Wikidata: (no message) (duration: 00m 12s)
  • 19:52 mutante: restarted apache on mw1111
  • 19:49 logmsgbot: kaldari Synchronized wmf-config/mobile.php: Deprecating WikiGrok A/B test congif vars (duration: 00m 09s)
  • 19:39 logmsgbot: reedy Synchronized php-1.25wmf9/extensions/SyntaxHighlight_GeSHi/: Fix noise in production (duration: 00m 06s)
  • 19:38 logmsgbot: kaldari Synchronized wmf-config/InitialiseSettings.php: Syncing InitialiseSettings for disabling WikiGrok on en.wiki (A/B test done) (duration: 00m 05s)
  • 19:32 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: disable error logging (duration: 00m 05s)
  • 19:27 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: Enable error log for next 10-15 minutes for the luls (duration: 00m 07s)
  • 19:26 logmsgbot: reedy Synchronized wmf-config/missing.php: CDB updates (duration: 00m 06s)
  • 19:26 logmsgbot: reedy Synchronized multiversion/: CDB updates (duration: 00m 07s)
  • 19:24 logmsgbot: reedy Synchronized wmf-config/: Config updates (duration: 00m 06s)
  • 19:23 logmsgbot: reedy Synchronized search-redirect.php: Fix undefined index spam (duration: 00m 06s)
  • 19:10 logmsgbot: reedy Synchronized wmf-config/: Wikidata config updates (duration: 00m 06s)
  • 19:10 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non wikipedias to 1.25wmf10
  • 18:51 YuviPanda: depooling mw1209 for HHVM re-imaging
  • 18:35 _joe_: repooling mw1076-mw1080
  • 18:08 YuviPanda: repooling mw121[0-9] as HHVM
  • 18:06 logmsgbot: reedy Synchronized wmf-config/: noop for scap test (duration: 00m 06s)
  • 18:06 Reedy: Reverted deployment of scap 6694d147a5b757dfbc747f0732185b014e82e9bb, scap now at b8fb82eb1834e3691287a6e24f8384c6c2259710
  • 17:58 logmsgbot: reedy Synchronized wmf-config/: nooop to test scap (duration: 00m 05s)
  • 17:57 Reedy: Deployed scap @ 6694d147a5b757dfbc747f0732185b014e82e9bb
  • 17:41 K4-713: updated payments to 30f15865bc4efe3b
  • 16:58 _joe_: uploaded hhvm 3.3.0+dfsg1-1+wm5
  • 16:23 _joe_: depooling mw1071-1080
  • 16:02 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: SWAT enable jpg thumbnail chaining on commons (duration: 00m 06s)
  • 15:49 YuviPanda: repooled mw1220, re-imaging to hhvm complete
  • 15:28 cmjohnson: rebooting analytics1033 to verify bios settings
  • 14:42 YuviPanda: depooling mw1220 for HHVM re-imaging
  • 14:36 YuviPanda: depooling mw1220 to re-image as HHVM
  • 11:54 godog: remove legacy symlink /home/wikipedia/syslog from lithium
  • 11:20 _joe_: repooling mw1061,mw1066-mw1070
  • 09:54 _joe_: repooling mw1060,mw1062-65; depooling mw1067-mw1070 for reimaging
  • 08:00 _joe_: depooling mw1060-mw1067 for reimaging
  • 07:37 _joe_: repooling mw1048-1052
  • 05:11 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: Set $wgTidyInternal to false unconditionally to ease deployment of tidy extension (duration: 00m 06s)
  • 04:23 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Dec 2 04:23:23 UTC 2014 (duration 23m 22s)
  • 02:32 logmsgbot: LocalisationUpdate completed (1.25wmf10) at 2014-12-02 02:32:21+00:00
  • 02:32 logmsgbot: l10nupdate Synchronized php-1.25wmf10/cache/l10n: (no message) (duration: 00m 03s)
  • 02:19 logmsgbot: LocalisationUpdate completed (1.25wmf9) at 2014-12-02 02:19:18+00:00
  • 02:19 logmsgbot: l10nupdate Synchronized php-1.25wmf9/cache/l10n: (no message) (duration: 00m 01s)
  • 01:42 awight: updated dash to b3f4be0bbd6c16be64030607fd9c59cb84111429
  • 01:37 K4-713: updated payments to c0c4bfcdb4fa625fa52
  • 00:06 logmsgbot: maxsem Synchronized php-1.25wmf10/extensions/Wikidata/: https://gerrit.wikimedia.org/r/#/c/176837/ (duration: 00m 12s)
  • 00:04 logmsgbot: maxsem Synchronized php-1.25wmf10/extensions/VisualEditor/: https://gerrit.wikimedia.org/r/#/c/176713/ (duration: 00m 06s)

December 1

  • 23:12 mutante: terbium - running rsync in screen to copy wikimania videos to labstore1001
  • 22:51 logmsgbot: aaron Synchronized wmf-config/jobqueue-eqiad.php: b13eaa3f6e287e7268951a2f7e3798f994a20b28; comment tweaks (duration: 00m 05s)
  • 22:38 bblack: rescaled ipvs weights for text/mobile/upload/bits to 1 (there was no differential weighting), for better sh scheduler
  • 22:34 cscott-split: updated OCG to version a06e7c186796a6ee5d5af81e93688520abdf2596
  • 22:33 logmsgbot: awight Synchronized php-1.25wmf10/extensions/FundraiserLandingPage: push FundraiserLandingPage GeoIP fix (duration: 00m 06s)
  • 22:33 logmsgbot: awight Synchronized php-1.25wmf9/extensions/FundraiserLandingPage: push FundraiserLandingPage GeoIP fix (duration: 00m 06s)
  • 22:33 logmsgbot: awight Synchronized php-1.25wmf10/extensions/LandingCheck: push LandingCheck GeoIP fix (duration: 00m 06s)
  • 22:33 logmsgbot: awight Synchronized php-1.25wmf9/extensions/LandingCheck: push LandingCheck GeoIP fix (duration: 00m 06s)
  • 22:32 logmsgbot: awight Synchronized php-1.25wmf10/extensions/DonationInterface: push DonationInterface translations (duration: 00m 06s)
  • 22:32 logmsgbot: awight Synchronized php-1.25wmf9/extensions/DonationInterface: push DonationInterface translations (duration: 00m 07s)
  • 22:16 logmsgbot: awight Synchronized php-1.25wmf10/extensions/FundraiserLandingPage: push FundraiserLandingPage GeoIP fix (duration: 00m 06s)
  • 22:16 logmsgbot: awight Synchronized php-1.25wmf9/extensions/FundraiserLandingPage: push FundraiserLandingPage GeoIP fix (duration: 00m 06s)
  • 22:16 logmsgbot: awight Synchronized php-1.25wmf10/extensions/LandingCheck: push LandingCheck GeoIP fix (duration: 00m 05s)
  • 22:16 logmsgbot: awight Synchronized php-1.25wmf9/extensions/LandingCheck: push LandingCheck GeoIP fix (duration: 00m 06s)
  • 22:15 logmsgbot: awight Synchronized php-1.25wmf10/extensions/DonationInterface: push DonationInterface translations (duration: 00m 09s)
  • 22:15 logmsgbot: awight Synchronized php-1.25wmf9/extensions/DonationInterface: push DonationInterface translations (duration: 00m 07s)
  • 21:06 K4-713: updated payments to 00415dd54bec2d4cf0a
  • 20:08 logmsgbot: yurik Synchronized php-1.25wmf10/extensions/ZeroPortal/: updatidng ZeroPortal to master (duration: 00m 05s)
  • 20:06 logmsgbot: yurik Synchronized php-1.25wmf10/extensions/ZeroBanner/: updatidng ZeroBanner to master (duration: 00m 06s)
  • 20:05 logmsgbot: yurik Synchronized php-1.25wmf9/extensions/ZeroPortal/: updatidng ZeroPortal to master (duration: 00m 05s)
  • 20:04 logmsgbot: yurik Synchronized php-1.25wmf9/extensions/ZeroBanner/: updatidng ZeroBanner to master (duration: 00m 08s)
  • 20:00 logmsgbot: yurik Synchronized mobilelanding.php: https://gerrit.wikimedia.org/r/#/c/175797/ (duration: 00m 06s)
  • 19:43 logmsgbot: maxsem Synchronized php-1.25wmf10/extensions/Popups/: https://gerrit.wikimedia.org/r/#/c/176715/ (duration: 00m 06s)
  • 19:43 logmsgbot: maxsem Synchronized php-1.25wmf9/extensions/Popups/: https://gerrit.wikimedia.org/r/#/c/176715/ (duration: 00m 05s)
  • 19:41 MaxSem: Stashed Tim's uncommitted tidy-related changes on tin
  • 19:19 K4-713: updated DjangoBannerStats to 3db799dc8705c728c
  • 18:25 bblack: ulsfo LVS updated for 'sh' for SSL as well
  • 18:22 bblack: eqiad+esams LVS back to normal, with new config for 'sh' for SSL
  • 18:15 bblack: ditto on pybal 'sh' stuff for esams
  • 18:10 bblack: stopping pybal on primary eqiad LVSes to test 'sh' change for SSL (already restarted for change on backup LVSes)
  • 17:02 andrewbogott: created empty jessie-wikimedia repo on Carbon
  • 16:38 logmsgbot: demon Synchronized wmf-config/CommonSettings.php: opensearchxml conditional include (duration: 00m 06s)
  • 16:37 logmsgbot: anomie Synchronized php-1.25wmf9/extensions/SyntaxHighlight_GeSHi/geshi/geshi.php: SWAT: Fix highly recursive number highlighting regex in GeSHi (duration: 00m 07s)
  • 16:35 logmsgbot: anomie Synchronized php-1.25wmf10/extensions/SyntaxHighlight_GeSHi/geshi/geshi.php: SWAT: Fix highly recursive number highlighting regex in GeSHi (duration: 00m 10s)
  • 16:04 logmsgbot: demon Synchronized wmf-config/abusefilter.php: (no message) (duration: 00m 05s)
  • 16:04 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 05s)
  • 15:49 bd808: restarted logstash on logstash1001; log2udp events were not being processed
  • 15:26 _joe_: depooling mw1047-mw1052
  • 15:24 _joe_: repooling mw1041-mw1046
  • 14:26 _joe_: depooling mw1041-1046
  • 14:16 _joe_: repooling mw1036-mw1040
  • 13:46 _joe_: removing the same files from ocg1002,3 as well
  • 13:44 _joe_: removing cache files from ocg1001, when they're older than 3 days
  • 09:55 _joe_: reimaging mw1033-mw1040 to HHVM, depooling from the main pool now
  • 09:31 _joe_: upgrading hhvm to the latest version across the cluster
  • 04:46 logmsgbot: tstarling Synchronized php-1.25wmf10/includes/parser/MWTidy.php: change previously pulled but scap was apparently not run (duration: 00m 05s)
  • 04:44 logmsgbot: tstarling Synchronized php-1.25wmf9/includes/parser/MWTidy.php: change previously pulled but scap was apparently not run (duration: 00m 06s)
  • 03:34 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Dec 1 03:34:57 UTC 2014 (duration 34m 56s)
  • 02:17 logmsgbot: LocalisationUpdate completed (1.25wmf10) at 2014-12-01 02:17:56+00:00
  • 02:17 logmsgbot: l10nupdate Synchronized php-1.25wmf10/cache/l10n: (no message) (duration: 00m 03s)
  • 02:10 logmsgbot: LocalisationUpdate completed (1.25wmf9) at 2014-12-01 02:10:33+00:00
  • 02:10 logmsgbot: l10nupdate Synchronized php-1.25wmf9/cache/l10n: (no message) (duration: 00m 01s)
  • 00:51 logmsgbot: tstarling Synchronized wmf-config/StartProfiler.php: (no message) (duration: 00m 05s)

November 30

  • 22:51 qchris: Updated EventLogging to 19c23698bc03694017d764af33307d6f035fc224 and restarted it
  • 20:51 qchris: restarted eventlogging mysql-m2-master consumer. It seems it could no longer write to the database.
  • 19:17 Krinkle: Disabling and relauching Gearman connection from Jenkins.
  • 10:28 logmsgbot: oblivian Synchronized wmf-config/jobqueue-eqiad.php: reverting to rdb1001 (duration: 00m 05s)
  • 10:13 mark: Rebooted asw-c4-eqiad
  • 09:13 _joe_: jobsqueues work again
  • 09:08 logmsgbot: oblivian Synchronized wmf-config/jobqueue-eqiad.php: changing the aggregator address as well (duration: 00m 05s)
  • 07:27 _joe_: restarted the jobrunner service on all jobrunners
  • 07:15 logmsgbot: oblivian Synchronized wmf-config/jobqueue-eqiad.php: (no message) (duration: 00m 05s)
  • 05:50 ori: 3:50 UTC: switch asw-c-eqiad lost connectivity with cabinet C4. Impact: phabricator down; gap in web request logs and some perf monitoring. Job queue and Recent Changes stream OK b/c redundant servers are up.
  • 03:41 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Nov 30 03:41:03 UTC 2014 (duration 41m 2s)
  • 02:18 logmsgbot: LocalisationUpdate completed (1.25wmf10) at 2014-11-30 02:18:22+00:00
  • 02:18 logmsgbot: l10nupdate Synchronized php-1.25wmf10/cache/l10n: (no message) (duration: 00m 02s)
  • 02:11 logmsgbot: LocalisationUpdate completed (1.25wmf9) at 2014-11-30 02:11:01+00:00
  • 02:11 logmsgbot: l10nupdate Synchronized php-1.25wmf9/cache/l10n: (no message) (duration: 00m 01s)

November 29

  • 04:17 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Nov 29 04:17:41 UTC 2014 (duration 17m 40s)
  • 02:25 logmsgbot: LocalisationUpdate completed (1.25wmf10) at 2014-11-29 02:25:53+00:00
  • 02:25 logmsgbot: l10nupdate Synchronized php-1.25wmf10/cache/l10n: (no message) (duration: 00m 01s)
  • 02:13 logmsgbot: LocalisationUpdate completed (1.25wmf9) at 2014-11-29 02:13:35+00:00
  • 02:13 logmsgbot: l10nupdate Synchronized php-1.25wmf9/cache/l10n: (no message) (duration: 00m 01s)

November 28

  • 13:19 YuviPanda: restarted apache on palladium, things are recovering
  • 11:20 qchris: Updated gerrit plugin its-phabricator-from-bugzilla to 97c5f02d3ca6259488a763515251c5cc57a11a51
  • 11:20 qchris: Updated gerrit plugin its-phabricator to 9edf90a182e43bfeea7ebbcb20d4a52b6213600d
  • 03:39 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Nov 28 03:39:52 UTC 2014 (duration 39m 51s)
  • 02:22 logmsgbot: LocalisationUpdate completed (1.25wmf10) at 2014-11-28 02:22:28+00:00
  • 02:22 logmsgbot: l10nupdate Synchronized php-1.25wmf10/cache/l10n: (no message) (duration: 00m 04s)
  • 02:10 logmsgbot: LocalisationUpdate completed (1.25wmf9) at 2014-11-28 02:10:06+00:00
  • 02:10 logmsgbot: l10nupdate Synchronized php-1.25wmf9/cache/l10n: (no message) (duration: 00m 01s)

November 27

  • 17:55 _joe_: restarted hhvm on mw1224, the alarm may have been lost in the puppet failure shower earlier
  • 17:24 godog: removed /var/lib/carbon/whisper/archived/jenkins from tungsten
  • 17:09 godog: upload txstatsd 0.7.0~bzr30-0ubuntu0+14 to precise-wikimedia on carbon
  • 16:49 godog: upload missing txstatsd 1.0.0-1 _source package_ to carbon
  • 16:48 godog: upload missing txstatsd 1.0.0-1 to carbon
  • 15:48 logmsgbot: hoo Synchronized php-1.25wmf10/extensions/Wikidata/: Fixing a data model bug + enable Statements on Properties for testwikidata (duration: 00m 12s)
  • 15:34 logmsgbot: hoo Synchronized wmf-config/Wikibase.php: Set "displayStatementsOnProperties" for wikidata/testwikidata (duration: 00m 06s)
  • 14:48 akosiaris: upgrading librsvg throughout the fleet
  • 04:35 springle: restarted squid3 on carbon, but glitches seem to be upstream
  • 04:23 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Nov 27 04:23:45 UTC 2014 (duration 23m 44s)
  • 03:45 springle: puppet failures everywhere; transient apt timeout
  • 02:33 logmsgbot: LocalisationUpdate completed (1.25wmf10) at 2014-11-27 02:33:36+00:00
  • 02:33 logmsgbot: l10nupdate Synchronized php-1.25wmf10/cache/l10n: (no message) (duration: 00m 01s)
  • 02:20 logmsgbot: LocalisationUpdate completed (1.25wmf9) at 2014-11-27 02:20:52+00:00
  • 02:20 logmsgbot: l10nupdate Synchronized php-1.25wmf9/cache/l10n: (no message) (duration: 00m 02s)
  • 00:30 logmsgbot: catrope Synchronized php-1.25wmf10/extensions/VisualEditor: SWAT (duration: 00m 06s)
  • 00:27 ejegg: set TY batch size=400
  • 00:09 awight: disabled thank-you activity records
  • 00:09 logmsgbot: catrope Synchronized php-1.25wmf10/extensions/VisualEditor: SWAT (duration: 00m 07s)
  • 00:09 logmsgbot: catrope Synchronized php-1.25wmf9/extensions/VisualEditor: SWAT (duration: 00m 05s)
  • 00:07 logmsgbot: catrope Synchronized wmf-config/InitialiseSettings.php: Re-enable VisualEditor on frwiktionary and svwiktionary (duration: 00m 06s)

November 26

  • 23:57 ejegg: set TY batch size=700
  • 23:48 YuviPanda: manually ran puppet merge on strontium, puppet merge on palladium didn't sync
  • 23:46 ejegg: enabling TY mail send
  • 23:45 ejegg: set TY batch size=1
  • 23:44 ejegg: updated crm from 96f66e6b6c947c4e4c32c4a4a32dc940dc3b1d60 to 68703898b7ebfb2a038f307f17788739114806e4
  • 23:38 hashar: Jenkins all happy after a restart. Crashing to bed
  • 22:56 hashar: Killing Jenkins, it is deadlocked beyond repair
  • 22:46 hashar: Jenkins still in deadlock, will hard restart Jenkins and Zuul soonish.
  • 22:38 ejegg: disabled TY email sending
  • 22:36 ejegg: enabled TY email sending
  • 22:34 ejegg: enabled CiviMail record creation for TY emails
  • 22:24 cscott: restarted ocg
  • 22:24 logmsgbot: gwicke Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 05s)
  • 22:22 hashar: Jenkins executors are in deadlock ( https://phabricator.wikimedia.org/T72597 )
  • 22:17 hashar: Bah there can only be one mediawiki-core-doxygen-publish job running, with all the merges that happened on mediawiki/core due to the release, there are currently six of them in the queue. They will all be processed eventually
  • 22:14 hashar: mediawiki/core postmerge changes are stuck because mediawiki-core-doxygen-publish refuses to start. Attempted to retrigger them by promoting a change: gallium$ zuul promote --pipeline postmerge --changes 175960,1
  • 22:13 ejegg: updated crm from d0a51250d2bdbf3c818ec0486af284691c7a61ff to 96f66e6b6c947c4e4c32c4a4a32dc940dc3b1d60
  • 22:08 hashar: investigating Zuul/Jenkins. Jenkins potentially has a deadlock
  • 22:02 cscott: updated Parsoid to version 67e2596c
  • 21:52 hashar: Restarting Gearman client. I am in a meeting, will cleanup later.
  • 21:33 bd808: restarted logstash on logstash1001; log2udp events not being received
  • 21:22 ejegg: disabled ty sending
  • 21:15 hashar: Zuul stuck, restarting Gearman client
  • 21:01 bd808: restarted elasticsearch on logstash1002 for OOM
  • 21:00 bd808: restarted elasticsearch on logstash1003 for OOM
  • 20:57 bd808: All three elasticsaerch nodes in the logstash clsuter think logstash1003 is master but ogstash-2014.11.26 is not allocated on any node
  • 20:56 ejegg: enabled queue consumers
  • 20:51 cscott: updated OCG to version 7d8f2b8bd496464041e3ef9c092732457cc8f7ef (did not restart ocg)
  • 20:50 logmsgbot: reedy Synchronized php-1.25wmf10: (no message) (duration: 00m 47s)
  • 20:30 logmsgbot: reedy Purged l10n cache for 1.25wmf6
  • 20:29 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.25wmf10
  • 20:28 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias to 1.25wmf9
  • 20:24 logmsgbot: reedy Finished scap: testwiki to 1.25wmf10 and build l10n cache (duration: 49m 03s)
  • 20:14 ejegg: updated crm from e13cae8c418d29ef444899e0a70bbe03f4b7079d to d0a51250d2bdbf3c818ec0486af284691c7a61ff
  • 20:13 ejegg: disabling queue consumers
  • 19:35 logmsgbot: reedy Started scap: testwiki to 1.25wmf10 and build l10n cache
  • 18:18 logmsgbot: ebernhardson Synchronized wmf-config/InitialiseSettings.php: Whitelist converted lqt pages on officewiki (duration: 00m 07s)
  • 16:54 logmsgbot: gwicke Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 05s)
  • 16:34 Krinkle: Changed Jenkins default language from "en_US" to "en" ("Ignore browser settings" was already enabled). Not sure why, but it's back to English now.
  • 16:16 logmsgbot: marktraceur Synchronized wmf-config/: [SWAT] [config] 174793 Enable VisualEditor as a Beta Feature on most remaining wikis (duration: 00m 06s)
  • 16:13 Krinkle: Jenkins is displaying everything in French (both logged-in/logged-out users alike)
  • 16:11 logmsgbot: marktraceur Synchronized php-1.25wmf9/extensions/Flow/: [SWAT] [wmf9] 175941 "Provide user to local LQT api calls" for officewiki. (duration: 00m 08s)
  • 12:25 godog: stopped ocg on ocg1*
  • 12:18 godog: restarting ocg on ocg1001
  • 12:00 godog: removing pdf files older than 14d from ocg100*
  • 11:57 godog: removing pdf files older than 14d from ocg1001
  • 06:48 logmsgbot: tstarling Synchronized w/oauth-headers.php: (no message) (duration: 00m 05s)
  • 06:43 logmsgbot: tstarling Synchronized w/oauth-headers.php: (no message) (duration: 00m 06s)
  • 06:40 logmsgbot: tstarling Synchronized live-1.5/oauth-headers.php: (no message) (duration: 00m 05s)
  • 06:34 logmsgbot: tstarling Synchronized php-1.25wmf9/extensions/OAuth/lib/OAuth.php: (no message) (duration: 00m 05s)
  • 06:09 logmsgbot: tstarling Synchronized php-1.25wmf9/extensions/OAuth/lib/OAuth.php: (no message) (duration: 00m 06s)
  • 06:07 logmsgbot: tstarling Synchronized php-1.25wmf9/extensions/OAuth/lib/OAuth.php: (no message) (duration: 00m 06s)
  • 05:15 logmsgbot: tstarling Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 06s)
  • 04:24 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Nov 26 04:24:14 UTC 2014 (duration 24m 13s)
  • 02:30 logmsgbot: LocalisationUpdate completed (1.25wmf9) at 2014-11-26 02:30:20+00:00
  • 02:30 logmsgbot: l10nupdate Synchronized php-1.25wmf9/cache/l10n: (no message) (duration: 00m 01s)
  • 02:18 logmsgbot: LocalisationUpdate completed (1.25wmf8) at 2014-11-26 02:18:29+00:00
  • 02:18 logmsgbot: l10nupdate Synchronized php-1.25wmf8/cache/l10n: (no message) (duration: 00m 03s)
  • 01:23 bd808: restarted logstash on logstash1001; no events from log2udp relay being recorded
  • 00:48 logmsgbot: aaron Synchronized wmf-config/StartProfiler.php: Remove obsolete profiling settings (duration: 00m 06s)
  • 00:33 springle: power down db2033 for reassignement to codfw frack
  • 00:04 qchris: restarted eventlogging mysql-m2-master consumer. It seems it could no longer write to the database.

November 25

  • 23:06 Tim: on osmium: removing stale static pcre and zip libraries in /usr/local , installed by hhvm
  • 22:50 logmsgbot: ebernhardson Finished scap: Bump Echo and Flow in 1.25wmf9 for officewiki deployment (duration: 30m 17s)
  • 22:20 logmsgbot: ebernhardson Started scap: Bump Echo and Flow in 1.25wmf9 for officewiki deployment
  • 22:20 logmsgbot: ebernhardson Synchronized php-1.25wmf9/extensions/Echo/: Bump Echo in 1.25wmf9 (duration: 00m 08s)
  • 21:55 logmsgbot: ejegg Synchronized wmf-config/CommonSettings.php: Turn CN client-side banner choice back on everywhere (duration: 00m 05s)
  • 21:39 logmsgbot: ejegg Synchronized php-1.25wmf8/extensions/CentralNotice/: One more CentralNotice fix to get out ahead of the winter rush - wmf8 (duration: 00m 07s)
  • 21:22 logmsgbot: ejegg Synchronized wmf-config/CommonSettings.php: Turn CN client-side banner choice back on for selected wmf9 wikis (duration: 00m 05s)
  • 21:15 logmsgbot: ejegg Synchronized php-1.25wmf9/extensions/CentralNotice/: One more CentralNotice fix to get out ahead of the winter rush (duration: 00m 05s)
  • 20:53 Nemo_bis: 100 % packet loss between esams and r1fra1.core.init7.net
  • 20:16 logmsgbot: reedy Synchronized php-1.25wmf9/extensions/Wikidata: Ic070ce0beb142e100490940fddaa0bd36b8a50be (duration: 00m 14s)
  • 20:09 logmsgbot: reedy Synchronized php-1.25wmf8/extensions/Wikidata: Ensure my sanity (duration: 00m 13s)
  • 19:49 bd808: restarted elasticsearch on logstah1002 after OOM
  • 19:38 logmsgbot: reedy Synchronized wmf-config/: Config updates (duration: 00m 06s)
  • 19:32 Reedy: Created wikilove tables on zhwikivoyage
  • 19:28 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non Wikipedias to 1.25wmf9
  • 19:27 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non Wikipedias to 1.25wmf9
  • 19:18 logmsgbot: reedy Synchronized php-1.25wmf8/extensions/Wikidata: I08946aac3 (duration: 00m 12s)
  • 19:18 logmsgbot: reedy Synchronized php-1.25wmf9/extensions/CentralNotice: Ib4d23f2a588f58ef3abcbd8b0b500ad8534723cd (duration: 00m 06s)
  • 19:17 logmsgbot: reedy Synchronized php-1.25wmf8/extensions/CentralNotice: Ib4d23f2a588f58ef3abcbd8b0b500ad8534723cd (duration: 00m 07s)
  • 18:28 csteipp: deployed patches for T74222 and T72901
  • 17:42 _joe_: repooled mw1019-1032,mw1053 in the appservers pool
  • 17:13 _joe_: depooled mw1019-1032 from the hhvm pool
  • 17:07 logmsgbot: bd808 Synchronized wmf-config/logging-labs.php: Update labs logging config (I1843dfd) (duration: 00m 06s)
  • 17:04 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: $wgPercentHHVM = 0 (duration: 00m 05s)
  • 17:00 logmsgbot: anomie Synchronized php-1.25wmf9/includes/api: SWAT: API: Work around wfMangleFlashPolicy() gerrit:175596 (duration: 00m 06s)
  • 16:53 logmsgbot: anomie Synchronized php-1.25wmf9/includes: SWAT: Make calling wfMangleFlashPolicy configurable gerrit:175598 (duration: 00m 09s)
  • 16:44 logmsgbot: bd808 Synchronized wmf-config/logging-labs.php: Update labs logging config (Ib8d8f8e) (duration: 00m 06s)
  • 16:36 logmsgbot: bd808 Synchronized wmf-config/logging-labs.php: Update labs logging config (Iaab0047) (duration: 00m 06s)
  • 16:24 logmsgbot: ori Synchronized wmf-config/InitialiseSettings.php: Ibd888465: Remove HHVM beta feature (duration: 00m 05s)
  • 16:14 _joe_: pooling mw1237-1258 in the appserver pool
  • 15:03 godog: upload bcache-tools 1.0.7-1 to carbon
  • 12:15 _joe_: pooling mw1221-mw1226 in the API pool
  • 06:37 YuviPanda: restarted apache on strontium, was seeing transient puppetmaster fails
  • 06:08 mutante: in respose to jenkins login issue reported by krinkle: /var/lib/jenkins/xml.config on gallium had "virt1000" value for LDAP, earlier Andrew made a switch from there to ldap-eqiad. fixed config, restarted jenkins
  • 06:06 mutante: restarted gitblit
  • 04:31 jgage: restarted jenkins
  • 04:22 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Nov 25 04:22:50 UTC 2014 (duration 22m 49s)
  • 03:53 Krinkle: Jenkins is unable to create new user sessions. Suspect LDAP is having issues.
  • 03:16 springle: m2 db1020 rebuilt, but blocked from dbproxy1002 until replag=0
  • 03:12 logmsgbot: awight Synchronized wmf-config: Disabling CentralNotice client banner choice due to T75812 (duration: 00m 05s)
  • 02:31 logmsgbot: LocalisationUpdate completed (1.25wmf9) at 2014-11-25 02:31:48+00:00
  • 02:31 logmsgbot: l10nupdate Synchronized php-1.25wmf9/cache/l10n: (no message) (duration: 00m 01s)
  • 02:18 logmsgbot: LocalisationUpdate completed (1.25wmf8) at 2014-11-25 02:18:52+00:00
  • 02:18 logmsgbot: l10nupdate Synchronized php-1.25wmf8/cache/l10n: (no message) (duration: 00m 01s)
  • 02:15 mutante: old-bugzilla now behind varnish too, cert issue should be gone
  • 02:07 bblack: all LVS back to normal runtime state w/ new SSL config
  • 01:56 bblack: switching off pybal on primary LVS in esams for HTTPS check
  • 01:54 bblack: switching off pybal on primary LVS in eqiad for HTTPS check
  • 01:51 bblack: esams+eqiad backup LVS converted to new ssl config (lvs100[45] + lvs300[34])
  • 01:43 logmsgbot: awight Synchronized php-1.25wmf9/extensions/CentralNotice: push CentralNotice updates (duration: 00m 05s)
  • 01:42 logmsgbot: awight Synchronized php-1.25wmf8/extensions/CentralNotice: push CentralNotice updates (duration: 00m 06s)
  • 01:21 bblack: disabling puppet on lvs[13]00[1-6] for SSL-related changes
  • 01:15 K4-713: disabling fredge consumer
  • 01:03 bblack: puppet back to normal on caches
  • 00:59 logmsgbot: bd808 Synchronized wmf-config/logging-labs.php: Update labs logging config (duration: 00m 05s)
  • 00:31 K4-713: updated payments to 3e3cda8f07af9f7f7
  • 00:25 logmsgbot: maxsem Synchronized php-1.25wmf8/extensions/MobileFrontend/: https://gerrit.wikimedia.org/r/175611 (duration: 00m 05s)
  • 00:22 logmsgbot: maxsem Synchronized php-1.25wmf9/extensions/MobileFrontend/: https://gerrit.wikimedia.org/r/175613 (duration: 00m 05s)
  • 00:20 bblack: puppet disabled on prod text/mobile/bits/upload varnishes for careful SSL changes
  • 00:07 logmsgbot: maxsem Synchronized wmf-config/logging-labs.php: https://gerrit.wikimedia.org/r/#/c/175604/ labs only (duration: 00m 05s)

November 24

  • 23:36 ori: gallium: rm -f'd /srv/ssd/jenkins-slave/workspace/mwext-DonationInterface-testextension/src/vendor/.git/HEAD.lock
  • 23:11 logmsgbot: legoktm Synchronized php-1.25wmf8/extensions/NavigationTiming/README.md: Update NavigationTiming https://gerrit.wikimedia.org/r/175585 (duration: 00m 05s)
  • 23:05 logmsgbot: legoktm Synchronized php-1.25wmf9/extensions/NavigationTiming: Update NavigationTiming for https://gerrit.wikimedia.org/r/#/c/175584/ (duration: 00m 05s)
  • 23:01 logmsgbot: legoktm Synchronized README: Updating README https://gerrit.wikimedia.org/r/175579 (duration: 00m 05s)
  • 22:50 bblack: opening up access to labs/private repo in gerrit perms
  • 22:44 logmsgbot: yurik Synchronized mobilelanding.php: https://gerrit.wikimedia.org/r/#/c/175550/ (duration: 00m 05s)
  • 22:13 awight: enabled client banner choice config everywhere
  • 22:13 logmsgbot: awight Synchronized wmf-config: Enable CentralNotice 2.5.0 client banner choice, everywhere (duration: 00m 05s)
  • 22:11 logmsgbot: awight Synchronized php-1.25wmf9/extensions/CentralNotice: push CentralNotice updates (duration: 00m 06s)
  • 22:10 awight: pushing CentralNotice patches
  • 22:10 logmsgbot: awight Synchronized php-1.25wmf8/extensions/CentralNotice: push CentralNotice updates (duration: 00m 06s)
  • 21:32 YuviPanda: restarted gitblit on antimony
  • 21:26 andrewbogott: restarting pdns on virt1000 and labcontrol2001
  • 21:26 andrewbogott: restarting opendj on labcontrol2001 and neptunium
  • 21:23 andrewbogott: stopping opendj service on virt1000
  • 21:22 andrewbogott: disabled ldap replication on virt1000
  • 19:01 ejegg: updated tools from b537e2ec80d16b84f8e0539d4e3d78c8afef1b63 to 113dfe160b750657626e07450003cc88d3939fbd
  • 16:38 andrewbogott: moved virt1000* certs out of /etc/ssl to verify that they are no longer used
  • 16:30 logmsgbot: bd808 Synchronized wmf-config/logging-labs.php: Revert monolog logging config (duration: 00m 05s)
  • 16:24 logmsgbot: manybubbles Synchronized wmf-config/: SWAT update config for stash limit in upload wizard (duration: 00m 06s)
  • 16:22 logmsgbot: manybubbles Synchronized php-1.25wmf9/extensions/BounceHandler/: SWAT update bounce handler to use right db (duration: 00m 06s)
  • 16:15 logmsgbot: manybubbles Synchronized wmf-config/logging-labs.php: SWAT update for labs - should be noop in production (duration: 00m 06s)
  • 14:44 manybubbles: restarting the elasticsearch server didn't cause any hickups. Rolling restart should be totally ok.
  • 14:21 manybubbles: performing test restart of elastic1002 to see what a rolling restart would be like while serving enwiki's searches
  • 12:51 _joe_: restarting mw1230, with hyperthreading enabled
  • 12:37 qchris: Added gerrit plugin its-phabricator-from-bugzilla (f9fd2db7a62119ab9a6d1adfd3110b6e59b7a872)
  • 10:58 godog: moved jenkins.ci under archived.jenkins.ci on tungsten, see T1075
  • 10:42 godog: backfilling old txstatsd metrics from / to statsd/ on tungsten
  • 03:31 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Nov 24 03:31:25 UTC 2014 (duration 31m 24s)
  • 02:17 logmsgbot: LocalisationUpdate completed (1.25wmf9) at 2014-11-24 02:17:02+00:00
  • 02:17 logmsgbot: l10nupdate Synchronized php-1.25wmf9/cache/l10n: (no message) (duration: 00m 01s)
  • 02:10 logmsgbot: LocalisationUpdate completed (1.25wmf8) at 2014-11-24 02:10:21+00:00
  • 02:10 logmsgbot: l10nupdate Synchronized php-1.25wmf8/cache/l10n: (no message) (duration: 00m 01s)

November 23

  • 22:26 ori: depooling mw1234; flapping.
  • 03:35 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Nov 23 03:35:48 UTC 2014 (duration 35m 47s)
  • 02:19 logmsgbot: LocalisationUpdate completed (1.25wmf9) at 2014-11-23 02:19:23+00:00
  • 02:19 logmsgbot: l10nupdate Synchronized php-1.25wmf9/cache/l10n: (no message) (duration: 00m 01s)
  • 02:13 logmsgbot: LocalisationUpdate completed (1.25wmf8) at 2014-11-23 02:12:55+00:00
  • 02:12 logmsgbot: l10nupdate Synchronized php-1.25wmf8/cache/l10n: (no message) (duration: 00m 01s)

November 22

  • 23:02 springle: upgrade db1020 trusty, xtrabackup clone db1046 to db1020
  • 21:21 hashar: Jenkins: disconnected/reconnected gallium slave. All executors were being busy / deadlocked
  • 20:29 springle: db1046 m2-master threadpool lockup, restarted mysqld, investigating
  • 19:39 logmsgbot: ori Synchronized wmf-config/InitialiseSettings.php: Ifae6e0ab6: Clean up indents, comments, spacing in InitialiseSettings (duration: 00m 05s)
  • 04:30 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Nov 22 04:30:53 UTC 2014 (duration 30m 51s)
  • 02:33 logmsgbot: LocalisationUpdate completed (1.25wmf9) at 2014-11-22 02:33:36+00:00
  • 02:33 logmsgbot: l10nupdate Synchronized php-1.25wmf9/cache/l10n: (no message) (duration: 00m 01s)
  • 02:20 logmsgbot: LocalisationUpdate completed (1.25wmf8) at 2014-11-22 02:20:36+00:00
  • 02:20 logmsgbot: l10nupdate Synchronized php-1.25wmf8/cache/l10n: (no message) (duration: 00m 01s)
  • 00:44 hoo: Disabled login for dewiki accounts "W" and "H"

November 21

  • 23:33 K4-713: Updated payments to 374480152a40d1b
  • 23:28 hoo: Disabled login for dewiki account "@"
  • 22:54 hoo: Disabled login for dewiki account "C"
  • 22:17 logmsgbot: ejegg Synchronized php-1.25wmf8/extensions/CentralNotice/: (no message) (duration: 00m 05s)
  • 21:45 logmsgbot: anomie Synchronized php-1.25wmf8/extensions/SecurePoll/: Backport another SecurePoll bug fix (duration: 00m 06s)
  • 21:44 logmsgbot: anomie Synchronized php-1.25wmf9/extensions/SecurePoll/: Backport another SecurePoll bug fix (duration: 00m 06s)
  • 21:32 logmsgbot: anomie Synchronized wmf-config/InitialiseSettings.php: Testing scap, no actual change (duration: 00m 05s)
  • 21:25 logmsgbot: anomie Synchronized wmf-config/InitialiseSettings.php: Testing scap, no actual change (duration: 00m 06s)
  • 21:23 logmsgbot: anomie Synchronized php-1.25wmf8/extensions/SecurePoll/: Backport SecurePoll bug fixes (duration: 00m 05s)
  • 21:22 logmsgbot: anomie Synchronized php-1.25wmf8/extensions/SecurePoll/: Backport SecurePoll bug fixes (duration: 00m 01s)
  • 21:18 logmsgbot: ori Synchronized php-1.25wmf9/extensions/SecurePoll: Backport SecurePoll bug fixes (duration: 00m 06s)
  • 21:14 logmsgbot: anomie Synchronized php-1.25wmf9/extensions/SecurePoll/: Backport SecurePoll bug fixes (duration: 00m 01s)
  • 21:10 logmsgbot: ejegg Synchronized php-1.25wmf9/extensions/CentralNotice/: (no message) (duration: 00m 07s)
  • 20:31 logmsgbot: aaron Synchronized wmf-config/InitialiseSettings.php: Removed duplicated BounceHandler log entry (duration: 00m 05s)
  • 18:16 chasemp: rebooting zirconium because bugzilla
  • 17:26 hoo: Disabled login for dewiki account "K"
  • 17:19 _joe_: apache hard restart on strontium
  • 16:59 _joe_: restarted hhvm on mw1025, TC cache exhausted
  • 16:43 _joe_: pooled mw1228-9
  • 16:33 _joe_: pooling mw1236 (HHVM) into the main apache pool
  • 16:23 _joe_: repooling mw1232-3
  • 16:17 godog: upload carbonate 0.2.2-1 to trusty-wikimedia
  • 15:56 _joe_: repooling mw1231
  • 15:52 _joe_: repooling mw1230
  • 15:45 _joe_: repooling mw1227
  • 14:57 logmsgbot: hoo Synchronized wmf-config/InitialiseSettings.php: Add 'BounceHandler' to wgDebugLogGroups (duration: 00m 05s)
  • 14:47 cmjohnson: mw1230-1233 down --reinstalling
  • 14:34 logmsgbot: hoo Synchronized wmf-config/CommonSettings.php: Also whitelist IPv6 ips for bouncehandler (duration: 00m 08s)
  • 13:48 _joe_: depooled mw1227
  • 13:08 springle: fresh dump db1046 to db2011
  • 11:39 logmsgbot: mark Synchronized wmf-config/InitialiseSettings.php: add openfashion.momu.be to wgCopyUploadsDomains (duration: 00m 06s)
  • 11:05 _joe_: pooled mw1234,mw1235 in the api pool
  • 10:40 _joe_: pooled mw1231,mw1232,mw1233 in the api pool
  • 10:33 _joe_: pooled mw1230 in the api pool
  • 10:29 _joe_: pooled mw1227 in the api pool
  • 04:19 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Nov 21 04:19:30 UTC 2014 (duration 19m 29s)
  • 02:26 logmsgbot: LocalisationUpdate completed (1.25wmf9) at 2014-11-21 02:26:31+00:00
  • 02:26 logmsgbot: l10nupdate Synchronized php-1.25wmf9/cache/l10n: (no message) (duration: 00m 02s)
  • 02:15 logmsgbot: LocalisationUpdate completed (1.25wmf8) at 2014-11-21 02:15:19+00:00
  • 02:15 logmsgbot: l10nupdate Synchronized php-1.25wmf8/cache/l10n: (no message) (duration: 00m 01s)
  • 01:24 logmsgbot: ebernhardson Synchronized php-1.25wmf8/extensions/Flow/includes/Parsoid/: Bump flow submodule in 1.25wmf8 (duration: 00m 04s)
  • 01:19 K4-713: updated payments to 4d6afa865b5e8
  • 01:02 ori: Updated scap to I5782e8cbe: Make the SSH user and authentication socket configurable
  • 00:57 qchris: disabled gerrit's hooks-bugzilla plugin (See T210)
  • 00:46 logmsgbot: catrope Synchronized php-1.25wmf9/resources/lib/oojs-ui/: SWAT (duration: 00m 03s)
  • 00:46 logmsgbot: catrope Synchronized php-1.25wmf9/extensions/Flow: SWAT (duration: 00m 05s)
  • 00:46 logmsgbot: catrope Synchronized php-1.25wmf9/extensions/MultimediaViewer: SWAT (duration: 00m 03s)
  • 00:46 logmsgbot: catrope Synchronized php-1.25wmf9/extensions/VisualEditor: SWAT (duration: 00m 04s)
  • 00:45 logmsgbot: catrope Synchronized php-1.25wmf8/resources/lib/oojs-ui/: SWAT (duration: 00m 03s)
  • 00:45 logmsgbot: catrope Synchronized php-1.25wmf8/extensions/Flow: SWAT (duration: 00m 05s)
  • 00:44 logmsgbot: catrope Synchronized php-1.25wmf8/extensions/MultimediaViewer: SWAT (duration: 00m 04s)
  • 00:44 logmsgbot: catrope Synchronized php-1.25wmf8/extensions/VisualEditor: SWAT (duration: 00m 04s)
  • 00:12 logmsgbot: catrope Synchronized wmf-config/: SWAT (again, forwmgMFCustomLogos) (duration: 00m 05s)
  • 00:07 logmsgbot: catrope Synchronized wmf-config/: SWAT (duration: 00m 08s)

November 20

  • 23:47 logmsgbot: maxsem Synchronized php-1.25wmf8/extensions/WikiGrok/: https://gerrit.wikimedia.org/r/174847 (duration: 00m 04s)
  • 23:46 logmsgbot: maxsem Synchronized php-1.25wmf9/extensions/WikiGrok/: https://gerrit.wikimedia.org/r/174847 (duration: 00m 04s)
  • 23:26 ^d: graceful'd mw1135, apc stale?
  • 23:19 ^d: running sync-common on mw1135. out of sync?
  • 23:13 logmsgbot: maxsem Synchronized php-1.25wmf9/extensions/MobileFrontend/: https://gerrit.wikimedia.org/r/#/c/174749/ (duration: 00m 04s)
  • 22:49 ejegg: rolled back payments-wiki from 1e533d6dfc200e6a84f0a8418a8a1ecddb2b3aed to e3d235f881282120409e1a6ed1a3908ce9a63c26
  • 22:31 logmsgbot: demon Synchronized wmf-config/: globaluserpage on beta, no-op sync (duration: 00m 07s)
  • 22:27 logmsgbot: LocalisationUpdate completed (1.25wmf9) at 2014-11-20 22:27:08+00:00
  • 22:23 logmsgbot: bd808 Synchronized php-1.25wmf9/cache/l10n: (no message) (duration: 08m 05s)
  • 22:03 logmsgbot: LocalisationUpdate completed (1.25wmf8) at 2014-11-20 22:03:06+00:00
  • 21:59 logmsgbot: bd808 Synchronized php-1.25wmf8/cache/l10n: (no message) (duration: 05m 05s)
  • 21:40 bd808|deploy: Testing l10nupdate changes
  • 21:38 ejegg: updated crm from ed3f3f8e31119eb7d52d5730ece4e22ac1dd055a to e13cae8c418d29ef444899e0a70bbe03f4b7079d
  • 21:33 ejegg: updated payments-wiki e3d235f881282120409e1a6ed1a3908ce9a63c26 to 1e533d6dfc200e6a84f0a8418a8a1ecddb2b3aed
  • 21:28 logmsgbot: aude Synchronized php-1.25wmf8/extensions/Wikidata: Update Wikidata - property suggester (duration: 00m 10s)
  • 21:26 logmsgbot: aude Synchronized php-1.25wmf9/extensions/Wikidata: Update test.wikidata - property suggester (duration: 00m 10s)
  • 21:00 ori: Updated EventLogging to 39de1d3faacc8463db7532405e8fc003b80ecb79
  • 20:54 ejegg: updated crm from ff89895638a0dd0600b2e4c0b6adfd1b8e402df5 to ed3f3f8e31119eb7d52d5730ece4e22ac1dd055a
  • 20:10 ejegg: updated tools from fe9b463379fac35ad5e71a57fbbb95ae39e2356e to b537e2ec80d16b84f8e0539d4e3d78c8afef1b63
  • 20:09 yuvipanda: run chown l10nupdate:wikidev /var/lock/scap on tin, for https://gerrit.wikimedia.org/r/#/c/174784/1
  • 19:42 logmsgbot: demon Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 04s)
  • 19:34 jgage: restarted puppetmasters
  • 19:30 logmsgbot: demon Finished scap: (no message) (duration: 23m 37s)
  • 19:06 logmsgbot: demon Started scap: (no message)
  • 18:51 logmsgbot: LocalisationUpdate completed (1.25wmf9) at 2014-11-20 18:51:39+00:00
  • 18:39 logmsgbot: LocalisationUpdate completed (1.25wmf8) at 2014-11-20 18:39:00+00:00
  • 18:27 andrewbogott: updated the python-openstack-wikistatus on carbon to 2014.11
  • 18:22 logmsgbot: demon Synchronized php-1.25wmf9/extensions/ExtensionDistributor/: (no message) (duration: 00m 07s)
  • 17:01 godog: reboot ms-be1007, xfs-induced high load
  • 16:59 _joe_: restart apache on mw1218, stuck in a apc futex
  • 16:38 logmsgbot: demon Synchronized php-1.25wmf9/extensions/Math: (no message) (duration: 00m 06s)
  • 16:26 _joe_: puppet reenabled everywhere, change tested and live on all varnishes within the next 20 minutes
  • 16:15 logmsgbot: demon Synchronized php-1.25wmf9/extensions/CirrusSearch: (no message) (duration: 00m 04s)
  • 16:15 logmsgbot: demon Synchronized php-1.25wmf8/extensions/CirrusSearch: (no message) (duration: 00m 05s)
  • 15:52 _joe_: disabling puppet on all caches, before a pretty large change, will be reeanbled after a few tests
  • 15:01 hashar: Restarting Jenkins AND Zuul. Beta cluster jobs are still deadlocked.
  • 14:00 godog: restart txstatsd on tungsten to stop receiving jenkins metrics
  • 13:04 hashar: Jenkins: restarting to remove a deadlock and unload the Statsd plugin
  • 12:30 godog: upload carbon-c-relay to trusty-wikimedia
  • 12:12 qchris: Restarted EventLogging mysql-m2 consumer to empty its caches
  • 05:41 bblack: amssq31-62, cp300[12], lvs300[34], ssl300[123] all shut down for esams power event (and downtimed)
  • 05:28 bblack: ~30m to esams power out, starting equipment shutdown and such for OE13/OE15
  • 05:15 Tim: made myself an administrator on phabricator
  • 04:27 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Nov 20 04:27:25 UTC 2014 (duration 27m 24s)
  • 02:34 logmsgbot: LocalisationUpdate completed (1.25wmf9) at 2014-11-20 02:34:12+00:00
  • 02:21 logmsgbot: LocalisationUpdate completed (1.25wmf8) at 2014-11-20 02:21:35+00:00
  • 00:50 logmsgbot: maxsem Synchronized php-1.25wmf9/extensions/MobileFrontend/: https://gerrit.wikimedia.org/r/#/c/174613/ (duration: 00m 04s)
  • 00:41 logmsgbot: catrope Synchronized wmf-config/InitialiseSettings.php: Change mobile wordmark image to relative URL (duration: 00m 04s)
  • 00:34 logmsgbot: catrope Synchronized php-1.25wmf9/extensions/VisualEditor: SWAT (duration: 00m 04s)
  • 00:34 logmsgbot: catrope Synchronized php-1.25wmf8/extensions/VisualEditor: SWAT (duration: 00m 04s)
  • 00:26 logmsgbot: catrope Synchronized php-1.25wmf8/includes/media/: SWAT: don't apply EXIF rotation to chained thumbnails (duration: 00m 04s)
  • 00:13 logmsgbot: catrope Synchronized wmf-config/: SWAT: temp debugging for SecurePoll (duration: 00m 04s)
  • 00:10 logmsgbot: catrope Synchronized wmf-config/: SWAT (duration: 00m 04s)
  • 00:09 logmsgbot: catrope Synchronized images/mobile/: SWAT: new Wikipedia wordmark for mobile (duration: 00m 03s)

November 19

  • 22:01 ottomata: starting trusty upgrade of analytics1041
  • 21:38 yuvipandajs: restarted txstatsd & carbon on labmon1001, recovering from missing points now
  • 21:35 logmsgbot: reedy Synchronized wmf-config/: ContactPage for legal (duration: 00m 17s)
  • 21:34 logmsgbot: reedy Synchronized docroot and w: (no message) (duration: 00m 15s)
  • 21:25 ottomata: starting trusty upgrade of analytics1040
  • 21:22 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.25wmf9
  • 21:21 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikipedias to 1.25wmf8
  • 21:18 logmsgbot: reedy Finished scap: testwiki to 1.25wmf9 and build l10n cache (duration: 105m 06s)
  • 20:57 ottomata: starting trusty upgrade of analytics1039
  • 20:23 ottomata: starting trusty upgrade of analytics1038
  • 19:49 bblack: starting the long slow process of draining out esams traffic ahead of power maint event
  • 19:34 ottomata: starting trusty upgrade of analytics1037
  • 19:33 logmsgbot: reedy Started scap: testwiki to 1.25wmf9 and build l10n cache
  • 19:30 andrewbogott: upgrading other compute hosts: virt1001-1009
  • 19:25 ori: disarming keyholder agent on tin to test alerts
  • 18:53 logmsgbot: reedy Synchronized php-1.25wmf8/extensions/Wikidata: Ie105a80aa776769eb0dae8a44cda0b7dbe018fb5 (duration: 00m 22s)
  • 18:50 andrewbogott: upgrading virt1006
  • 18:36 andrewbogott: upgrading labnet1001
  • 18:33 andrewbogott: upgraded glance on virt1000 to version icehouse
  • 18:29 ottomata: starting trusty upgrade of 1036
  • 18:18 logmsgbot: reedy Synchronized wmf-config/: BounceHandler (duration: 00m 15s)
  • 18:17 logmsgbot: reedy Synchronized php-1.25wmf8/extensions/BounceHandler/: Bump (duration: 00m 14s)
  • 17:57 andrewbogott: disabled keystone-redis because the current package doesn't work with icehouse
  • 16:36 andrewbogott: upgrading virt1000
  • 16:16 andrewbogott: moved virt1000 db backup to /a/osback because it was /way/ too big to fit in my homedir
  • 16:12 hashar: Jenkins: uninstalled Jenkins statsd plugin ( https://phabricator.wikimedia.org/T1278 ). It is overloading the statsd server with a bunch of metrics we don't care about ( https://phabricator.wikimedia.org/T1075 )
  • 15:57 andrewbogott: backed up all labs openstack databases to virt1000:~andrew/osback/havana-db-backup.sql
  • 15:49 andrewbogott: backing up labs configs in ~andrew/osback/<servicename>
  • 15:48 andrewbogott: beginning upgrade of labs OpenStack from Havana to Icehouse
  • 09:44 logmsgbot: hoo Synchronized php-1.25wmf8/extensions/Wikidata/: Fix url and commonsMedia UI editing (duration: 00m 42s)
  • 05:44 legoktm: batchCAAntiSpoof finished with "34721605 user(s) done."
  • 04:57 ejegg: updated tools from 419fb7aa32c6d0776056968378e358ee01985565 to fe9b463379fac35ad5e71a57fbbb95ae39e2356e
  • 04:42 Tim: on mw1114: testing xhprof hack for T758
  • 04:15 ori: mw1020: disabled puppet & restarted hhvm w/hhvm.eval.perf_pid_map = true to test
  • 04:15 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Nov 19 04:15:00 UTC 2014 (duration 14m 59s)
  • 02:26 logmsgbot: LocalisationUpdate completed (1.25wmf8) at 2014-11-19 02:26:12+00:00
  • 02:14 logmsgbot: LocalisationUpdate completed (1.25wmf7) at 2014-11-19 02:14:12+00:00
  • 01:28 qchris: Restarted EventLogging mysql-m2 consumer to pick up switch to dbproxy1002
  • 01:23 logmsgbot: ori Synchronized php-1.25wmf7/extensions/SyntaxHighlight_GeSHi: I788e1beb8: Update SyntaxHighlight_GeSHi for cherry-picks (duration: 00m 05s)
  • 01:20 logmsgbot: ori Synchronized php-1.25wmf8/extensions/SyntaxHighlight_GeSHi: Ibb0f7c24: Update SyntaxHighlight_GeSHi for cherry-picks (duration: 00m 05s)
  • 00:48 springle: m2-master CNAME switch to dbproxy1002, and db1046 to primary backend
  • 00:38 ori: EventLogging: deployed 423f7dd5b2b5 & restarted.
  • 00:21 logmsgbot: maxsem Synchronized wmf-config/mobile.php: https://gerrit.wikimedia.org/r/174303 (duration: 00m 05s)
  • 00:09 MaxSem: gracefulled apache on mw1205 (suspect an APC bug)
  • 00:06 ori: repooled mw1205

November 18

  • 23:34 logmsgbot: ori Synchronized wmf-config: I76f2023a1: 'Undeploy AntiBot' (duration: 00m 04s)
  • 23:19 logmsgbot: demon Synchronized wmf-config: undeploy antibot (duration: 00m 04s)
  • 22:46 bd808: Updated zuul config on gallium to include I511b14e (Make cdb-phpunit job non-voting)
  • 22:27 ^d: elasticsearch: set a template to apply auto_expand_replicas 0-2 on all newly created indexes.
  • 22:00 logmsgbot: hoo Synchronized wmf-config/Wikibase.php: Bump cache epoch (duration: 00m 07s)
  • 21:50 logmsgbot: hoo Synchronized php-1.25wmf8/extensions/Wikidata/: Fix EntityIdLabelFormatter et al. (duration: 00m 17s)
  • 21:33 ^d: elasticsearch: set auto_expand_replicas to 0-2 on ttmserver(-test) like other indexes for extra redundancy.
  • 20:25 ori: re-enabling puppet on all varnishes following deployment of Iac35f2329
  • 20:10 legoktm: running batchCAAntiSpoof.php on terbium
  • 20:02 legoktm: ran populateGlobalRenameLogSearch.php on metawiki
  • 19:47 ori: disabling Puppet on varnishes to push out Iac35f2329
  • 19:40 logmsgbot: reedy Synchronized wmf-config/Wikibase.php: bump epoch (duration: 00m 13s)
  • 19:40 ottomata: starting trusty upgrade of analytics1032
  • 19:28 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 14s)
  • 19:23 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non wikipedias to 1.25wmf8
  • 19:14 ottomata: starting trusty upgrade of analytics1031
  • 19:05 ori: depooled mw1205; out of sync
  • 19:03 logmsgbot: maxsem Synchronized php-1.25wmf7/extensions/WikiGrok/: Retry sync (duration: 00m 07s)
  • 18:57 logmsgbot: maxsem Synchronized php-1.25wmf7/extensions/WikiGrok/: Revert (duration: 00m 04s)
  • 18:54 logmsgbot: maxsem Synchronized php-1.25wmf7/extensions/WikiGrok/: SQL backed version (duration: 00m 04s)
  • 18:49 logmsgbot: maxsem Synchronized php-1.25wmf8/extensions/WikiGrok/: SQL backed version (duration: 00m 05s)
  • 18:47 ejegg: updated crm from 71ec68e8da1de289c4e7adca090c0fdbccbd8b8a to ff89895638a0dd0600b2e4c0b6adfd1b8e402df5
  • 18:47 ejegg: updated crm
  • 18:42 ottomata: starting trusty upgrade of analtyics1030
  • 18:42 logmsgbot: demon Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 04s)
  • 18:34 ejegg: update crm from e9e81a828d50e8bddf98eae699c925e09b25927b to 71ec68e8da1de289c4e7adca090c0fdbccbd8b8a
  • 18:31 MaxSem: created wikigrok_questions table on test, test2 and enwiki
  • 18:25 logmsgbot: maxsem Synchronized php-1.25wmf7/extensions/MobileFrontend/: https://gerrit.wikimedia.org/r/#/c/173316/ (duration: 00m 04s)
  • 18:25 logmsgbot: maxsem Synchronized php-1.25wmf8/extensions/MobileFrontend/: https://gerrit.wikimedia.org/r/#/c/173316/ (duration: 00m 08s)
  • 18:08 ottomata: starting trusty upgrade of analytics1029
  • 18:04 logmsgbot: aaron Synchronized wmf-config/StartProfiler.php: Added switch-logic for new Profiler config format (duration: 00m 05s)
  • 17:40 logmsgbot: demon Synchronized php-1.25wmf8/extensions/Translate/ttmserver/ElasticSearchTTMServer.php: hack (duration: 00m 04s)
  • 17:40 logmsgbot: demon Synchronized php-1.25wmf7/extensions/Translate/ttmserver/ElasticSearchTTMServer.php: hack (duration: 00m 04s)
  • 17:34 _joe_: restarting icinga
  • 17:24 logmsgbot: demon Synchronized php-1.25wmf7/extensions/Translate/scripts/ttmserver-export.php: profiling hack (duration: 00m 04s)
  • 17:22 ottomata: starting trusty upgrade of analytics1020
  • 17:21 logmsgbot: demon Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 05s)
  • 17:15 logmsgbot: demon Synchronized php-1.25wmf8/extensions/Translate/scripts/ttmserver-export.php: profiling hack (duration: 00m 06s)
  • 17:07 logmsgbot: demon Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 04s)
  • 17:06 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 04s)
  • 16:35 logmsgbot: anomie Synchronized php-1.25wmf8/extensions/SecurePoll: SWAT: Fix SecurePollContent handling gerrit:174125 (duration: 00m 09s)
  • 16:32 logmsgbot: anomie Synchronized php-1.25wmf8/extensions/MultimediaViewer: SWAT: Media Viewer UI bugfixes gerrit:174116 (for real this time) (duration: 00m 09s)
  • 16:30 godog: restarting txstatsd on tungsten to drop old metrics
  • 16:29 logmsgbot: anomie Synchronized php-1.25wmf8/extensions/MultimediaViewer: SWAT: Media Viewer UI bugfixes gerrit:174116 (duration: 00m 09s)
  • 16:28 ottomata: starting upgrade to trusty of analytics1017
  • 16:27 logmsgbot: anomie Synchronized php-1.25wmf8/includes/filebackend: SWAT: Log more details about backend-fail-internal errors gerrit:174128 (duration: 00m 09s)
  • 16:18 bblack: rubidium+eeden gdnsd upgraded to 2.1.0 (baham was already there)
  • 16:06 manybubbles: replaying 20,000 searches at approximately the same speed that they were issued caused only marginal bounce in load (cluster load average was 13% and two machines went about 20%). We're ready from a performance standpoint. yay
  • 16:02 logmsgbot: anomie Synchronized wmf-config/InitialiseSettings.php: Touch a random PHP file, supposedly required (duration: 00m 09s)
  • 16:02 manybubbles: replaying some searches against cirrus to make *super* *duper* sure it won't fall over tomorrow when we enable enwiki
  • 16:01 logmsgbot: anomie Synchronized visualeditor-default.dblist: SWAT: Enable VisualEditor by default on Catalan Wikiquote (cawikiquote) gerrit:174036 (duration: 00m 09s)
  • 16:01 logmsgbot: anomie Synchronized visualeditor.dblist: SWAT: Enable VisualEditor by default on Catalan Wikiquote (cawikiquote) gerrit:174036 (duration: 00m 09s)
  • 15:43 ottomata: starting trusty upgrade of analytics1016
  • 15:32 hashar: Deleting job https://integration.wikimedia.org/ci/job/mediawiki-vendor-integration/ replaced by mediawiki-phpunit. Clearing out workspaces bug 73515
  • 14:58 ottomata: starting upgrade to Trusty of analytics1015
  • 14:55 springle: fail over m2 to m2-slave (db1046); investigating db1020
  • 14:44 hashar: Gerrit web interface dead with: Cannot open ReviewDb bug 73555
  • 04:16 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Nov 18 04:16:03 UTC 2014 (duration 16m 2s)
  • 02:26 logmsgbot: LocalisationUpdate completed (1.25wmf8) at 2014-11-18 02:26:17+00:00
  • 02:22 ^d: jenkins locale set from 'en' to 'en_US' since 'en' means Italian somehow.
  • 02:14 logmsgbot: LocalisationUpdate completed (1.25wmf7) at 2014-11-18 02:14:18+00:00
  • 01:27 Tim: on osmium: removing ori's custom kernel and rebooting
  • 01:23 springle: temporarily reassign db1004 for phab migration tests
  • 00:55 logmsgbot: maxsem Synchronized php-1.25wmf8/resources/lib/oojs-ui/: https://gerrit.wikimedia.org/r/#/c/174029/ (duration: 00m 04s)
  • 00:53 bd808: Restarted logstash on logstash1002; lots of errors in the log about GELF input >128 chunks
  • 00:50 bd808: Restarted hung logstash process on logstash1001
  • 00:47 logmsgbot: maxsem Synchronized wmf-config: https://gerrit.wikimedia.org/r/173878 (duration: 00m 03s)
  • 00:44 logmsgbot: maxsem Synchronized visualeditor.dblist: https://gerrit.wikimedia.org/r/172996 (duration: 00m 03s)
  • 00:43 logmsgbot: maxsem Synchronized visualeditor-default.dblist: https://gerrit.wikimedia.org/r/172993 (duration: 00m 04s)
  • 00:34 logmsgbot: maxsem Synchronized php-1.25wmf7/extensions/VisualEditor/: SWAT (duration: 00m 04s)
  • 00:33 logmsgbot: maxsem Synchronized php-1.25wmf7/extensions/MobileFrontend/: SWAT (duration: 00m 04s)
  • 00:33 logmsgbot: maxsem Synchronized php-1.25wmf7/extensions/WikiGrok: SWAT (duration: 00m 03s)
  • 00:32 logmsgbot: maxsem Synchronized php-1.25wmf8/extensions/VisualEditor/: SWAT (duration: 00m 04s)
  • 00:24 logmsgbot: maxsem Synchronized php-1.25wmf8/extensions/MobileFrontend/: SWAT (duration: 00m 04s)
  • 00:23 logmsgbot: maxsem Synchronized php-1.25wmf8/extensions/WikiGrok/: (no message) (duration: 00m 04s)
  • 00:21 logmsgbot: maxsem Synchronized php-1.25wmf8/extensions/VisualEditor/: SWAT (duration: 00m 04s)
  • 00:21 logmsgbot: maxsem Synchronized php-1.25wmf8/extensions/Flow: SWAT (duration: 00m 05s)

November 17

  • 23:59 logmsgbot: awight Synchronized wmf-config: Enable new CentralNotice features on mediawikiwiki (duration: 00m 04s)
  • 23:35 logmsgbot: awight Synchronized php-1.25wmf8/extensions/CentralNotice: push CentralNotice updates (duration: 00m 05s)
  • 22:50 cscott: updated Parsoid to version 819b2cf4
  • 22:03 logmsgbot: awight Synchronized wmf-config: Enable new CentralNotice features on beta.wmflabs (duration: 00m 07s)
  • 20:48 logmsgbot: ejegg Synchronized wmf-config: (no message) (duration: 00m 03s)
  • 20:39 logmsgbot: ejegg Synchronized php-1.25wmf8/extensions/CentralNotice/: Update CentralNotice for client-side banner choice (duration: 00m 03s)
  • 20:02 ottomata: starting upgrade of analytics1014 to trusty
  • 17:57 ottomata: starting upgrade to trusty of analytics1013 (having trouble scheduling downtime in icinga right now)
  • 16:38 akosiaris: upload etherpad-lite_1.4.1-1 on apt.wikimedia.org
  • 16:38 logmsgbot: demon Synchronized php-1.25wmf8/extensions/SecurePoll/: (no message) (duration: 00m 05s)
  • 16:38 logmsgbot: demon Synchronized php-1.25wmf7/extensions/SecurePoll/: (no message) (duration: 00m 05s)
  • 16:11 logmsgbot: demon Synchronized php-1.25wmf8/extensions/CirrusSearch: (no message) (duration: 00m 04s)
  • 16:11 logmsgbot: demon Synchronized php-1.25wmf7/extensions/CirrusSearch: (no message) (duration: 00m 05s)
  • 16:09 hashar: Renamed job mediawiki-vendor-integration to mediawiki-phpunit bug 72787
  • 16:03 logmsgbot: demon Synchronized wmf-config/CirrusSearch-common.php: more jobs (duration: 00m 04s)
  • 14:31 hashar: Jenkins/Zuul: disconnected/reconnected Jenkins Gearman client
  • 13:38 apergos: ran puppetstoredconfigclean.rb on db1017, it must have been missed in the rename
  • 12:17 akosiaris: final reboot for xenon, cerium, praseodymium after a dist-upgrade -y
  • 11:27 logmsgbot: ori Synchronized php-1.25wmf8/includes/Import.php: Icc19961fd: 'Debugging statements to try to diagnose bug 40009' (duration: 00m 08s)
  • 11:22 logmsgbot: ori Synchronized wmf-config/InitialiseSettings.php: Ied0a7ab4b: Route Bug40009 logs to fluorine (duration: 00m 07s)
  • 09:51 _joe_: restarting mw1187, all apache children stuck in apc_pthreadmutex_lock()
  • 09:10 akosiaris: praseodymium reimaging
  • 08:53 logmsgbot: oblivian Synchronized wmf-config/CommonSettings.php: Open HHVM to 25% of anons (duration: 00m 06s)
  • 08:49 hashar: if there is any oddity with Jenkins/Zuul please poke me. I am on IRC all day today
  • 08:39 hashar: Jenkins upgraded
  • 08:30 akosiaris: reimaging cerium
  • 08:21 hashar: Upgrading Jenkins
  • 04:15 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Nov 17 04:15:03 UTC 2014 (duration 15m 2s)
  • 02:27 logmsgbot: LocalisationUpdate completed (1.25wmf8) at 2014-11-17 02:27:25+00:00
  • 02:15 logmsgbot: LocalisationUpdate completed (1.25wmf7) at 2014-11-17 02:15:14+00:00

November 16

  • 14:23 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: Disable error Loggroup (duration: 00m 18s)
  • 14:20 logmsgbot: reedy Synchronized wmf-config/CommonSettings.php: Fix Undefined index: HTTPS (duration: 00m 17s)
  • 14:15 logmsgbot: krinkle Synchronized php-1.25wmf8/includes: I8764cf5df87b (duration: 00m 10s)
  • 14:13 logmsgbot: reedy Synchronized rpc/RunJobs.php: (no message) (duration: 00m 16s)
  • 14:10 logmsgbot: krinkle Synchronized php-1.25wmf7/includes: I8764cf5df87b226 (duration: 00m 10s)
  • 13:53 logmsgbot: krinkle Synchronized wmf-config/InitialiseSettings.php: If9194b73c3256e0064ff (duration: 00m 07s)
  • 04:09 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Nov 16 04:09:17 UTC 2014 (duration 9m 16s)
  • 02:22 logmsgbot: LocalisationUpdate completed (1.25wmf8) at 2014-11-16 02:22:49+00:00
  • 02:16 logmsgbot: LocalisationUpdate completed (1.25wmf7) at 2014-11-16 02:16:32+00:00

November 15

  • 21:11 logmsgbot: reedy Synchronized database lists: update size dblists (duration: 00m 17s)
  • 16:49 logmsgbot: reedy Synchronized docroot and w: fix typo (duration: 00m 15s)
  • 16:48 logmsgbot: reedy Synchronized docroot and w: dbtree (duration: 00m 14s)
  • 16:38 logmsgbot: reedy Synchronized docroot and w: dbtree (duration: 00m 14s)
  • 14:26 YuviPanda: made reedy 'full' user on webmaster tools
  • 12:23 logmsgbot: reedy Synchronized wmf-config/missing.php: hhvm support (duration: 00m 14s)
  • 12:20 logmsgbot: reedy Synchronized multiversion/: CDB updates (duration: 00m 14s)
  • 11:32 logmsgbot: reedy Synchronized wmf-config/missing.php: Fix php short tags (duration: 00m 16s)
  • 11:31 logmsgbot: reedy Synchronized php-1.25wmf7/extensions/CommonsMetadata/: Fix warnings (duration: 00m 18s)
  • 11:31 logmsgbot: reedy Synchronized php-1.25wmf7/extensions/cldr/: Fix warnings (duration: 00m 15s)
  • 03:28 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Nov 15 03:28:47 UTC 2014 (duration 28m 46s)
  • 02:16 logmsgbot: LocalisationUpdate completed (1.25wmf8) at 2014-11-15 02:16:31+00:00
  • 02:10 logmsgbot: LocalisationUpdate completed (1.25wmf7) at 2014-11-15 02:10:32+00:00

November 14

  • 23:21 logmsgbot: kaldari Synchronized wmf-config/mobile.php: Updating WikiGrok A/B test start/end times (duration: 00m 07s)
  • 22:41 logmsgbot: ori Synchronized wmf-config/InitialiseSettings.php: If60e3fe97: Deploy Translate extension on ca.wikimedia (duration: 00m 05s)
  • 21:47 bd808: restarted /etc/init.d/ganglia-monitor on logstash1003
  • 18:44 legoktm: running scripts to fix bug 72927
  • 18:01 akosiaris: reimaging xenon
  • 16:40 bd808: Increased replica count from 0 to 2 for all logstash elasticsearch indices. Expect icinga warnings as replicas are populated.
  • 15:56 ottomata: upgrading analytics1024 to trusty
  • 15:46 ottomata: analytics1003 (a cisco) is acting crazy, stuck in some loop while trying to boot. Am attempting to fix with power cycle
  • 14:35 paravoid: cr1-ulsfo: setting up BGP with new transit provider
  • 09:16 hashar: Zuul is back
  • 09:14 hashar: Zuul is flapping
  • 07:38 jgage: logstash hosts: elasticsearch moved to bigger disks
  • 04:36 Tim: on mw1114 restarting hhvm
  • 04:17 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Nov 14 04:17:16 UTC 2014 (duration 17m 15s)
  • 04:03 jgage: logstash1002 migration to new md0 complete
  • 02:30 logmsgbot: LocalisationUpdate completed (1.25wmf8) at 2014-11-14 02:30:34+00:00
  • 02:28 Tim: progressively increasing load on mw1114, attempting to reproduce the previous overload
  • 02:19 jgage: logstash1003 elasticsearch migration to new raid0 complete
  • 02:18 logmsgbot: LocalisationUpdate completed (1.25wmf7) at 2014-11-14 02:18:01+00:00
  • 01:39 logmsgbot: kaldari Synchronized wmf-config/mobile.php: updating WikiGrok A/B test times (duration: 00m 03s)
  • 01:04 logmsgbot: kaldari Synchronized php-1.25wmf7/extensions/MobileFrontend: (no message) (duration: 00m 05s)
  • 01:04 logmsgbot: kaldari Synchronized php-1.25wmf7/extensions/WikiGrok: (no message) (duration: 00m 03s)
  • 00:49 jgage: logstash1003: migrating elasticsearch data to new raid volume
  • 00:42 logmsgbot: kaldari Synchronized wmf-config/mobile.php: Update WikiGrok A/B test times (duration: 00m 03s)
  • 00:20 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 06s)
  • 00:15 mutante: nickel - shutdown
  • 00:15 logmsgbot: demon Synchronized php-1.25wmf8/extensions/VisualEditor: (no message) (duration: 00m 04s)
  • 00:14 logmsgbot: demon Synchronized php-1.25wmf7/extensions/VisualEditor: (no message) (duration: 00m 05s)
  • 00:08 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 04s)
  • 00:06 logmsgbot: demon Synchronized wmf-config/: (no message) (duration: 00m 07s)
  • 00:04 logmsgbot: demon Synchronized php-1.25wmf8/extensions/Echo: (no message) (duration: 00m 04s)
  • 00:04 logmsgbot: demon Synchronized php-1.25wmf7/extensions/Echo: (no message) (duration: 00m 06s)

November 13

  • 23:55 mutante: nickel - remove from puppet,salt,icinga,stop services...
  • 23:52 ^d: restarted gitblit on antimony
  • 23:39 logmsgbot: kaldari Synchronized wmf-config/mobile.php: Adding WikiGrok A/B test start and end times (duration: 00m 03s)
  • 22:19 jgage: hadoop: analytics1010 is again active namenode
  • 22:14 logmsgbot: awight Synchronized php-1.25wmf8/extensions/CentralNotice: push CentralNotice updates (duration: 00m 04s)
  • 22:13 logmsgbot: awight Synchronized php-1.25wmf7/extensions/CentralNotice: push CentralNotice updates (duration: 00m 05s)
  • 22:12 qchris: restarted EventLogging jobs that write to disk, to pick up config changes
  • 22:03 jgage: failed over hadoop namenode to analytics1004
  • 21:42 logmsgbot: awight Synchronized wmf-config: Enabling CentralNotice banner choice on testwiki, take 2 (duration: 00m 06s)
  • 21:15 cscott: updated Parsoid to version dabff010
  • 20:51 cmjohnson: powering down logstash1001 to add disks
  • 20:39 cmjohnson: powering down logstash1002 to add disks
  • 20:37 awight: CentralNotice noops deployed to all wikis
  • 20:36 logmsgbot: awight Synchronized php-1.25wmf7/extensions/CentralNotice: push CentralNotice updates (duration: 00m 05s)
  • 20:33 logmsgbot: awight Synchronized wmf-config: Enabling CentralNotice banner choice on testwiki (duration: 00m 04s)
  • 20:32 bd808: Dropped replica count of all logstash indices except today to 0. Should make rolling restarts faster during hardware upgrade.
  • 20:25 logmsgbot: awight Synchronized php-1.25wmf8/extensions/CentralNotice: push CentralNotice updates (duration: 00m 05s)
  • 20:19 csteipp: patched bugs 71111 and 71394 in wmf7 and wmf8
  • 20:14 cmjohnson: powering down logstash1003 for a few mins to add disks
  • 19:52 ottomata: starting upgrade to trusty on analytics1023
  • 19:15 awight: campaigns reenabled
  • 18:55 awight: disabling CentralNotice campaigns
  • 17:49 ottomata: preparing for trusty upgrade of analytics1003
  • 16:57 bd808: dropped replica count to 0 for logstash indices from 2014-10-30 and 2014-10-31.
  • 16:49 bd808: restarted elasticsearch on logstash1002
  • 16:46 bd808: dropped replica count to 0 for logstash indices from 2014-10-14 through 2014-10-29. See https://phabricator.wikimedia.org/P73 for the commands.
  • 16:45 ottomata: preparing to upgrade analytics1026 to trusty
  • 16:21 bd808: disk utilization is 94% on logstash1002, 92% on logstash1001 and 91% on logstash1003. Too much data in indices even with replica count bumped down to 1 for the small disks we have today.
  • 16:16 bd808: logstash elasticsearch cluster is pretty messed up. logstash1002 has lost shards for all indices except for today, and it's master for that one.
  • 16:16 logmsgbot: manybubbles Synchronized php-1.25wmf8/extensions/CirrusSearch/: SWAT update cirrussearch to fix slow prefix queries (duration: 00m 05s)
  • 16:14 logmsgbot: manybubbles Synchronized wmf-config/CirrusSearch-production.php: SWAT reenable regex search now that it will not crash elasticsearch (duration: 00m 04s)
  • 16:13 logmsgbot: manybubbles Synchronized wmf-config/CirrusSearch-common.php: SWAT reenable accelerated regex search (regex search still disabled) (duration: 00m 03s)
  • 16:11 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: SWAT force summary when running checkuser query on all wikis (duration: 00m 04s)
  • 16:01 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: SWAT revert JPG thumbnail chaining on all wikis except commons (duration: 00m 05s)
  • 15:27 logmsgbot: hashar: deleted all content from https://doc.wikimedia.org/ :-( Will regenerate.
  • 15:09 godog: rolling restart of object-auditor in swift codfw/eqiad to pick up changes
  • 15:06 logmsgbot: yurik Synchronized php-1.25wmf8/extensions/ZeroPortal: updatidng ZeroPortal to master (duration: 01m 13s)
  • 15:04 chasemp: phabricator upgrades T1203
  • 14:43 logmsgbot: hashar: restarted zuul-merger on gallium
  • 14:42 logmsgbot: hashar: restarting Jenkins and Zuul
  • 12:45 godog: investigating high iops on swift eqiad with paravoid, stopped object-auditor on ms-be1005 and ms-be1015
  • 11:09 hashar: resurrected morebots in #wikimedia-operations (see Morebots).
  • 11:08 hashar: Killed Jenkins due to a deadlock
  • 11:08 hashar: Killing Jenkins due to a deadlock
  • 02:52 mutante: beta puppet freshness - UNKNOWN: No valid datapoints found .. since 13d
  • 02:30 logmsgbot: LocalisationUpdate completed (1.25wmf8) at 2014-11-13 02:30:00+00:00
  • 02:18 logmsgbot: LocalisationUpdate completed (1.25wmf7) at 2014-11-13 02:18:44+00:00
  • 00:46 mutante: thulium - Could not intern from pson: expected value in object at '"[PHP]\n\n; puppet:t'!

November 12

  • 21:59 logmsgbot: reedy Synchronized wmf-config/: Set useLegacyUsageIndex = true for Wikibase client (duration: 00m 17s)
  • 20:57 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.25wmf8
  • 20:55 hashar: Restarting Jenkins, deadlock on deployment-bastion
  • 20:28 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 15s)
  • 20:28 logmsgbot: reedy Synchronized database lists: (no message) (duration: 00m 15s)
  • 20:25 manybubbles: restarting elastic1021 to pick up new plugins
  • 20:21 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias to 1.25wmf7
  • 20:13 logmsgbot: reedy Finished scap: testwiki to 1.25wmf8 and build l10n cache (duration: 53m 57s)
  • 19:37 hoo: Made myself oauthadmin on mediawikiwiki
  • 19:19 logmsgbot: reedy Started scap: testwiki to 1.25wmf8 and build l10n cache
  • 19:05 mutante: installing package upgrades on bast1001 (incl. PHP version)
  • 19:04 mutante: installing package upgrades on iron
  • 18:38 YuviPanda: turned off yurik's zerosms cronjob on stat1002 (already discussed with him, he was ok with it being stopped until he could find time to fix it)
  • 17:58 _joe_: gracefulling apache on problematic API hosts
  • 17:05 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 04s)
  • 16:51 logmsgbot: anomie Synchronized php-1.25wmf7/extensions/SecurePoll/: SWAT: SecurePoll fix for jump-text and title on create/edit gerrit:172718 (for real this time) (duration: 00m 09s)
  • 16:48 logmsgbot: anomie Finished scap: SWAT: SecurePoll fix for jump-text and title on create/edit gerrit:172718 (duration: 22m 13s)
  • 16:26 logmsgbot: anomie Started scap: SWAT: SecurePoll fix for jump-text and title on create/edit gerrit:172718
  • 16:25 logmsgbot: anomie Synchronized php-1.25wmf7/extensions/MultimediaViewer/: SWAT: Backport MediaViewer options menu layout fix gerrit:172737 (duration: 00m 09s)
  • 16:04 logmsgbot: anomie Synchronized wmf-config: SWAT: Set different ImageMetrics sampling factor for logged-in users gerrit:172720 (duration: 00m 12s)
  • 16:01 logmsgbot: anomie Synchronized wmf-config/Wikibase.php: SWAT: Add "featured portal" badge (Q17580674) gerrit:172729 (duration: 00m 10s)
  • 14:55 logmsgbot: oblivian Synchronized wmf-config/CommonSettings.php: Open HHVM to 20% of anons (duration: 00m 06s)
  • 14:27 manybubbles: restarting elastic1016 to pick up new plugins.... half way done
  • 14:24 _joe_: load test on hhvm done
  • 13:55 godog: rolling reload of swift on ms-be1* to pick up statsd changes
  • 13:32 godog: rolling reload of swift on ms-fe1* to pick up statsd changes
  • 10:28 _joe_: repooling mw1189 with a reduced hhvm thread count for testing (puppet disabled, as well)
  • 10:16 _joe_: depooling mw1189 from the api pool for reimaging
  • 08:17 _joe_: stress testing a group of HHVM servers in anticipation for the move to 20% of traffic
  • 03:35 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Nov 12 03:35:17 UTC 2014 (duration 35m 16s)
  • 02:15 logmsgbot: LocalisationUpdate completed (1.25wmf7) at 2014-11-12 02:15:49+00:00
  • 02:09 logmsgbot: LocalisationUpdate completed (1.25wmf6) at 2014-11-12 02:09:14+00:00

November 11

  • 17:12 cscott: removed old ocg cronjobs on ocg100x; see https://bugzilla.wikimedia.org/show_bug.cgi?id=73166
  • 16:48 logmsgbot: reedy Synchronized wmf-config: Use Texvc filter if available (duration: 00m 15s)
  • 14:23 logmsgbot: reedy Synchronized private/PrivateSettings.php: Add $wmgVERPsecret for BounceHandler (duration: 00m 14s)
  • 14:09 logmsgbot: reedy Synchronized php-1.25wmf7/extensions/BounceHandler/: (no message) (duration: 00m 15s)
  • 14:04 logmsgbot: reedy Synchronized php-1.25wmf6/vendor/: (no message) (duration: 00m 15s)
  • 14:04 logmsgbot: reedy Synchronized php-1.25wmf6/extensions/BounceHandler/: (no message) (duration: 00m 14s)
  • 13:41 logmsgbot: reedy Synchronized wmf-config: (no message) (duration: 00m 14s)
  • 13:35 godog: rolling reload on ms-be2* to pick up statsd changes
  • 13:12 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non wikipedias to 1.25wmf7
  • 13:11 logmsgbot: reedy Purged l10n cache for 1.25wmf5
  • 13:08 YuviPanda: deleting tons of junk data generated by interaction between txstatsd and the labs graphite archiver on labmon1001
  • 04:29 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Nov 11 04:29:32 UTC 2014 (duration 29m 31s)
  • 02:41 logmsgbot: LocalisationUpdate completed (1.25wmf7) at 2014-11-11 02:40:58+00:00
  • 02:28 logmsgbot: LocalisationUpdate completed (1.25wmf6) at 2014-11-11 02:28:21+00:00
  • 00:21 logmsgbot: catrope Synchronized php-1.25wmf7/extensions/Flow: SWAT (duration: 00m 05s)
  • 00:21 logmsgbot: catrope Synchronized php-1.25wmf7/extensions/VisualEditor: SWAT (duration: 00m 04s)

November 10

  • 23:22 paravoid: reprepro: include src:libmaxminddb, src:geoipupdate for precise/trusty
  • 22:14 cscott: updated Parsoid to version b61475196
  • 21:49 cscott: updated OCG to version d9855961b18f550f62c0b20da70f95847a215805
  • 21:36 mutante: powercycling frozen stat1002
  • 18:42 manybubbles: restarting remaining elasticsearch boxes in sequence to pick up new plugins
  • 18:30 godog: reboot db1017 to pick up an updated kernel
  • 18:29 logmsgbot: ori Synchronized php-1.25wmf6/includes/ChangeTags.php: Iec9befeba: Hide HHVM tag on Special:{Contributions,RecentChanges,...} (duration: 00m 05s)
  • 18:29 logmsgbot: ori Synchronized php-1.25wmf7/includes/ChangeTags.php: Iec9befeba: Hide HHVM tag on Special:{Contributions,RecentChanges,...} (duration: 00m 06s)
  • 17:52 manybubbles: restart elastic1002 to pick up new plugins
  • 17:16 manybubbles: elastic1001 finished restarting. letting is soak up shards for a few minutes to make sure restart was ok. then we'll plow through the others
  • 17:02 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 05s)
  • 16:52 manybubbles: restarting elastic1001 to pick up new plugins.
  • 16:50 manybubbles: deployed new versions of elasticsearch plugins to fix regex querying
  • 16:48 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: tocuh (duration: 00m 14s)
  • 16:03 logmsgbot: anomie Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable JPG thumbnail chaining on all wikis except commons gerrit:172254 (duration: 00m 09s)
  • 16:01 logmsgbot: anomie Synchronized wmf-config/Wikibase.php: SWAT: Enable experimental Wikidata features on labs gerrit:172239 (duration: 00m 09s)
  • 15:50 logmsgbot: reedy Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 14s)
  • 11:54 logmsgbot: oblivian Synchronized wmf-config/CommonSettings.php: Open HHVM to 15% of anons (duration: 00m 06s)
  • 10:06 _joe_: upgraded hhvm on the whole cluster
  • 09:19 hashar: Restarting Jenkins to java 7
  • 09:17 _joe_: upgrading hhvm across the fleet with new package with debug symbols
  • 09:14 hashar: Jenkins: switching from Java 6 to Java 7 153764
  • 09:02 _joe_: repooling mw1189 at reduced load
  • 08:52 _joe_: dist-upgrading mw1189 to use the latest kernel available, then rebooting
  • 08:32 YuviPanda: ran mklost+found on /srv/postgres for reducing cronspam
  • 08:21 paravoid: force-rebooting ms-be2011, kernel "xfs stuck"
  • 01:55 ori: depooled mw1114 after it became unresponsive, likely <https://phabricator.wikimedia.org/T1195>

November 9

  • 23:44 hoo: Changed the email for a global account. Bug 73014.
  • 21:56 _joe_: depooling mw1189 from the api pool, see https://phabricator.wikimedia.org/T1194
  • 19:12 _joe_: restarted apache on mw1192, this time an hard restart
  • 17:11 hoo: mw1192 stuck with almost no idle workers as most workers are in the "Gracefully finishing" state. Attempted to gracefully restart it, but that (to no surprise) didn't help.

November 8

  • 20:17 Krinkle: Jenkins/Zuul was still stuck. Disconnected and relaunched slave agents on lanthanum and gallium. This fixed it (slaves in labs were fine).
  • 20:01 Krinkle: Jenkins/Zuul appear stuck. Disconnect/Re-enable Gearman from Jenkins.
  • 15:30 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: Gerrit I46b151ff: Reverting addition of Draft namespace to enwiki (duration: 00m 04s)
  • 11:09 YuviPanda: ran makelost+found on /srv/postgres on labsdb1004 to kill cronspam
  • 11:07 YuviPanda: ran makelost+found on /srv/postgres on labsdb1007 to kill cronspam

November 7

  • 20:53 logmsgbot: ebernhardson Synchronized php-1.25wmf7/extensions/Flow: Bump flow submodule for bug 71858 (duration: 00m 08s)
  • 20:33 logmsgbot: ori Synchronized php-1.25wmf6/includes/WebResponse.php: I569b2ebbc: Add WebResponse::getHeader() (duration: 00m 09s)
  • 20:13 logmsgbot: ori Synchronized php-1.25wmf7/includes/WebResponse.php: I569b2ebbc: Add WebResponse::getHeader() (duration: 00m 07s)
  • 20:03 YuviPanda: restarted gitblit on antimony
  • 19:36 YuviPanda: upgraded php5-fss to 1.0-2 on virt1000 to prevent cronspam
  • 19:36 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 04s)
  • 16:27 godog: shut db1017 briefly for cmjohnson to look
  • 14:46 _joe_: installing hhvm package built with full debug symbols on mw1114
  • 09:00 logmsgbot: oblivian Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 07s)
  • 08:00 _joe_: powercycled mw1169, console unresponsive, not responding to pings
  • 07:52 _joe_: killed the master apache process on mw1191, stuck in a futex wait, restarted apache
  • 07:44 _joe_: upgrading the hhvm appservers to the new package version, it seems stable enough
  • 01:01 logmsgbot: reedy Synchronized php-1.25wmf7/extensions/Flow/: (no message) (duration: 00m 16s)
  • 00:49 logmsgbot: reedy Synchronized php-1.25wmf6/includes/api/: (no message) (duration: 00m 14s)
  • 00:36 logmsgbot: reedy Synchronized php-1.25wmf7/extensions/MobileFrontend/: (no message) (duration: 00m 16s)
  • 00:27 logmsgbot: reedy Synchronized php-1.25wmf7/extensions/VisualEditor/: (no message) (duration: 00m 15s)
  • 00:26 logmsgbot: reedy Synchronized php-1.25wmf6/extensions/GeoData: (no message) (duration: 00m 14s)
  • 00:03 Reedy: running foreachwikiindblist wikidataclient.dblist extensions/Wikidata/extensions/Wikibase/lib/maintenance/populateSitesTable.php --strip-protocols
  • 00:00 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: touch (duration: 00m 14s)
  • 00:00 logmsgbot: reedy Synchronized langlist: mai (duration: 00m 14s)

November 6

  • 23:26 bd808: deleted corrupt mediawki/core clone in workspace/mwext-MobileFrontend-qunit-mobile on gallium
  • 23:24 bd808: Killed 3 hung /usr/local/bin/logstash_optimize_index.sh processes on logstash1002
  • 23:22 bd808: restarted logstash on logstash1002
  • 23:21 bd808: restarted logstash on logstash1003
  • 23:02 bd808: restarted logstash on logstash1001 for the usual reason (no events making it to elasticsearch)
  • 22:19 subbu: updated parsoid to d23d2be6 (+ a hotfix to the production localsettings config file)
  • 21:26 ottomata: added Range header field to varnishkafka webrequest logs
  • 19:45 andrewbogott: restarted ntp on labstore1001
  • 19:33 manybubbles: manybubbles is done with SWAT
  • 16:11 logmsgbot: manybubbles Synchronized php-1.25wmf7/extensions/MultimediaViewer/: SWAT revert layout changes (duration: 00m 06s)
  • 16:02 logmsgbot: manybubbles Synchronized wmf-config/: SWAT deploy some beta configs. Should be noop. (duration: 00m 04s)
  • 15:49 _joe_: load-testing hhvm, in particular the servers with the new package
  • 15:43 _joe_: upgrading mw1031,mw1032 to the new package, no crashes seeen since reinstall
  • 15:24 manybubbles: finished with performance testing for cirrus - new servers look like way way more than enough power
  • 15:01 manybubbles: dewiki is fine. trying enwiki.
  • 14:57 manybubbles: performance test for zhwiki was good. trying dewiki
  • 14:55 manybubbles: running performance test for Cirrus taking zhwiki
  • 12:09 akosiaris: Depool wtp1001, wtp1003-1006 for trusty upgrade
  • 10:07 _joe_: temporary raising weight of mw1018 and 1030 in pybal to load-test them and check for crashes
  • 09:53 _joe_: installing the new hhvm package on mw1030 and mw1018 in order to test for stability
  • 02:05 awight: CRM: drush vset maintenance_mode 1
  • 01:21 Tim: restarted gmond on mw1018 and mw1031
  • 01:06 mutante: git-sync-upstream on deployment-salt for beta puppetmaster
  • 00:56 awight: disabling all queue consumers.
  • 00:28 logmsgbot: maxsem Synchronized php-1.25wmf6/extensions/MobileFrontend/: (no message) (duration: 00m 04s)
  • 00:24 logmsgbot: maxsem Synchronized php-1.25wmf6/extensions/MobileFrontend/: (no message) (duration: 00m 07s)
  • 00:20 logmsgbot: maxsem Synchronized php-1.25wmf7/extensions/VisualEditor/: SWAT (duration: 00m 07s)
  • 00:18 logmsgbot: maxsem Synchronized php-1.25wmf6/extensions/MobileFrontend/: SWAT (duration: 00m 04s)
  • 00:18 logmsgbot: maxsem Synchronized php-1.25wmf6/extensions/Flow/: SWAT (duration: 00m 05s)
  • 00:13 andrewbogott: ocg1001 is depressingly tiny and will probably keeping complaining about disk space until it's rebuilt
  • 00:09 andrewbogott: cleaned up some log files on ocg1001 and reduced logrotations to 7.

November 5

  • 22:02 ejegg: updated fraud scoring
  • 21:24 subbu: redployed parsoid deploy sha 66befe47 (with the right bunyan log level that unbreaks VE)
  • 21:13 subbu: deployed parsoid version 978623eb
  • 21:05 Reedy: Running foreachwikiindblist wikidataclient.dblist extensions/Wikidata/extensions/Wikibase/lib/maintenance/populateSitesTable.php --strip-protocols
  • 21:02 logmsgbot: reedy Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 15s)
  • 21:00 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: maiwiki
  • 21:00 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: maiwiki (duration: 00m 14s)
  • 20:59 logmsgbot: reedy Synchronized database lists: maiwiki (duration: 00m 14s)
  • 20:56 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: maiwiki
  • 20:56 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: maiwiki (duration: 00m 15s)
  • 20:55 logmsgbot: reedy Synchronized database lists: maiwiki (duration: 00m 18s)
  • 20:49 awight: turning off CiviMail activity record for each TY
  • 20:45 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 17s)
  • 20:39 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.25wmf7
  • 20:37 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias to 1.25wmf6
  • 20:34 logmsgbot: reedy Finished scap: testwiki to 1.25wmf7, build l10n cache (duration: 45m 03s)
  • 19:49 logmsgbot: reedy Started scap: testwiki to 1.25wmf7, build l10n cache
  • 19:48 logmsgbot: reedy scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki="fawiki" --list-file="/srv/mediawiki-staging/wmf-config/extension-list" --output="/tmp/tmp.37qNnawZ9J" --verbose' returned non-zero exit status 1 (duration: 00m 13s)
  • 19:47 logmsgbot: reedy Started scap: testwiki to 1.25wmf7, build l10n cache
  • 19:47 logmsgbot: reedy scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki="fawiki" --list-file="/srv/mediawiki-staging/wmf-config/extension-list" --output="/tmp/tmp.4wTY29z5Gg" ' returned non-zero exit status 1 (duration: 01m 08s)
  • 19:46 logmsgbot: reedy Started scap: testwiki to 1.25wmf7, build l10n cache
  • 19:17 andrewbogott: removed libvips-dev and libvips-tools from our custom repo for Trusty. The default packages seem to work fine.
  • 18:30 andrewbogott: restarting icinga on neon
  • 18:10 awight: disabled TY job
  • 18:02 ^d: elastic1022 unbanned from allocation since it has a network cable again
  • 17:14 logmsgbot: demon Synchronized wmf-config/PoolCounterSettings-eqiad.php: (no message) (duration: 00m 06s)
  • 17:02 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: frwiki getting cirrusy search (duration: 00m 05s)
  • 10:47 _joe_: installed hhvm 3.3.0-20140925+wmf4 on osmium for testing.
  • 09:11 akosiaris: depool wtp1002, wtp1007-wtp1012
  • 09:09 akosiaris: repool wtp1013,wtp1014,wtp1015,wtp1016,wtp1017
  • 07:18 ori: rolled back cluster:appserver_hhvm to version 3.3.0-20140925+wmf3 of hhvm package
  • 06:29 akosiaris: depool wtp1013, wtp1014, wtp1015, wtp1016, wtp1023 for trusty reinstallation
  • 05:53 ori: ran: salt -G php:hhvm cmd.run 'restart hhvm'
  • 05:26 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: $wgPercentHHVM: back to 5% (duration: 00m 11s)
  • 03:21 ^d|voted: restarted lucene-search-2 on search1019: it'd been timing out for a few days and filled disk with log files.
  • 02:25 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: If866e9caf: $wgPercentHHVM: 5 => 10, to test https://phabricator.wikimedia.org/T820#18870 (duration: 00m 04s)
  • 01:21 logmsgbot: ori Synchronized php-1.25wmf5/extensions/MobileFrontend: Ic82ba72b98: Update MobileFrontend for cherry-picks (duration: 00m 04s)
  • 01:21 logmsgbot: ori Synchronized php-1.25wmf6/extensions/MobileFrontend: Ic26f56c0d: Update MobileFrontend for cherry-picks (duration: 00m 05s)
  • 01:12 ^d|voted: elastic1022: banned from allocation since its unreachable. just in case it starts flapping.
  • 01:11 mutante: elatic1022 - eth0: <NO-CARRIER
  • 01:07 ori: upgrading HHVM app servers to 3.3.0+dfsg1-1+wm2
  • 01:02 mutante: powercycling elastic1022
  • 00:45 ^d|voted: elasticsearch: rebuilding all cirrus indexes for all wikis from a screen on terbium, going to take awhile. should be boring, but if causing problems kill it first and then find me.
  • 00:24 logmsgbot: demon Synchronized php-1.25wmf6/includes/parser/Parser.php: (no message) (duration: 00m 04s)
  • 00:23 logmsgbot: demon Synchronized php-1.25wmf6/includes/parser/CoreTagHooks.php: (no message) (duration: 00m 04s)
  • 00:23 logmsgbot: demon Synchronized php-1.25wmf5/includes/parser/Parser.php: (no message) (duration: 00m 04s)
  • 00:23 logmsgbot: demon Synchronized php-1.25wmf5/includes/parser/CoreTagHooks.php: (no message) (duration: 00m 05s)
  • 00:22 logmsgbot: demon Synchronized php-1.25wmf6/extensions/CirrusSearch/maintenance/updateOneSearchIndexConfig.php: (no message) (duration: 00m 04s)
  • 00:22 logmsgbot: demon Synchronized php-1.25wmf5/extensions/CirrusSearch/maintenance/updateOneSearchIndexConfig.php: (no message) (duration: 00m 04s)
  • 00:04 logmsgbot: demon Synchronized wmf-config/CirrusSearch-common.php: (no message) (duration: 00m 04s)
  • 00:03 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 07s)

November 4

  • 19:22 cmjohnson: rebooting wtp1023
  • 17:21 ejegg: updated crm from b8a1fa98b5d9252d708090c99b61fd22ebe8d2be to e9e81a828d50e8bddf98eae699c925e09b25927b
  • 16:53 akosiaris: repool wtp1017,wtp1018,wtp1019,wtp1020
  • 16:50 hashar: restarting Zuul/Jenkins entirely
  • 16:45 logmsgbot: manybubbles Synchronized php-1.25wmf6/extensions/UniversalLanguageSelector/: SWAT update uls (duration: 00m 04s)
  • 16:43 logmsgbot: manybubbles Synchronized php-1.25wmf5/extensions/UniversalLanguageSelector/: SWAT update uls (duration: 00m 04s)
  • 16:33 hashar: Shutting down Jenkins to remove a deadlock :-(
  • 16:26 hashar: Jenkins restarting Gearman client
  • 16:24 hashar: Zuul on hold, waiting for beta cluster related jobs to complete
  • 16:21 hashar: Jenkins: disconnecting/reconnecting gearman client , killing deployment-bastion.eqiad slave in an attempt to remove a deadlock bug 70597
  • 14:06 akosiaris: upgrading kernels on amssq*
  • 13:59 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 14s)
  • 13:58 Reedy: graceful apache on mw1193
  • 13:57 Reedy: graceful apache on mw1144
  • 13:51 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 15s)
  • 13:40 akosiaris: depool wtp1017, wtp1018, wtp1019, wtp1020 from trusty reinstall
  • 13:39 akosiaris: upgrading apache2 throught the mw cluster
  • 13:24 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 19s)
  • 13:14 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non wikipedias to 1.25wmf6
  • 13:03 logmsgbot: reedy Purged l10n cache for 1.25wmf4
  • 12:41 akosiaris: repooled wtp1021,wtp1022,wtp1023
  • 10:42 akosiaris: depooled wtp1021,wtp1022,wtp1023 for re-installation with trusty
  • 06:31 springle: force logrotate ocg1001
  • 03:32 springle: restart db2017
  • 00:12 logmsgbot: catrope Synchronized php-1.25wmf6/extensions/MultimediaViewer: SWAT (duration: 00m 03s)
  • 00:12 logmsgbot: catrope Synchronized php-1.25wmf6/extensions/MobileFrontend: SWAT (duration: 00m 04s)
  • 00:12 logmsgbot: catrope Synchronized php-1.25wmf6/extensions/VisualEditor: SWAT (duration: 00m 04s)
  • 00:09 logmsgbot: catrope Synchronized php-1.25wmf5/extensions/VisualEditor: SWAT (duration: 00m 04s)
  • 00:05 logmsgbot: catrope Synchronized wmf-config/InitialiseSettings.php: Flow on officewiki and mw.org research page (duration: 00m 04s)

November 3

  • 23:34 bd808: Changed elasticsearch template for logstash to use "doc_values" for raw fields. http://www.elasticsearch.org/blog/disk-based-field-data-a-k-a-doc-values/
  • 23:01 cscott: reconfigured OCG logstash path to use bunyan. The _type field is currently missing (used to be "OfflineContentGenerator"). Will fix tomorrow.
  • 22:32 cscott: updated OCG to version 5834af97ae80382f3368dc61b9d119cef0fe129b
  • 21:55 ejegg: enabled recurring globalcollect processor
  • 20:49 logmsgbot: maxsem Synchronized wmf-config/mobile.php: https://gerrit.wikimedia.org/r/#/c/170453/ (duration: 00m 03s)
  • 20:23 logmsgbot: maxsem Synchronized wmf-config/InitialiseSettings.php: Enable WikiGrok on enwiki (duration: 00m 04s)
  • 19:51 logmsgbot: maxsem Synchronized wmf-config/InitialiseSettings.php: Enable WikiGrok on test and test2 (duration: 00m 04s)
  • 19:43 logmsgbot: maxsem Finished scap: Build localization cache for WikiGrok (duration: 35m 09s)
  • 19:08 logmsgbot: maxsem Started scap: Build localization cache for WikiGrok
  • 18:55 awight: restarting fredge consumer
  • 18:09 awight: restarting donations queue consumer
  • 18:09 awight: update crm from f47ed6f7e55946388db1dde787ca458c27a57c5a to b8a1fa98b5d9252d708090c99b61fd22ebe8d2be
  • 16:57 akosiaris: repool wtp1024 at regular weight
  • 16:34 _joe_: rolling-restarting hhvm appservers
  • 16:25 godog: reboot ms-be2007, disk replaced but no corresponding raid0 LD
  • 16:22 andrewbogott: added yuvi to 'Ops' ldap group
  • 16:03 logmsgbot: anomie Synchronized docroot and w: (no message) (duration: 00m 10s)
  • 14:38 akosiaris: wtp1024 re-installed as trusty
  • 14:38 akosiaris: repool wtp1024 with a weight of 1 instead of 15 for now
  • 13:18 akosiaris: depool wtp1024.eqiad.wmnet in preparation for reimaging to trusty
  • 11:26 akosiaris: disable puppet on labsdb1004, labsdb1005 for postgresql reinitialization

November 2

  • 20:41 logmsgbot: hoo Synchronized php-1.25wmf6/extensions/CentralAuth/: Fix LocalPageMoveJob (duration: 00m 08s)
  • 20:41 logmsgbot: hoo Synchronized php-1.25wmf5/extensions/CentralAuth/: Fix LocalPageMoveJob (duration: 00m 09s)

November 1

  • 11:07 logmsgbot: oblivian Synchronized wmf-config/CommonSettings.php: re-set hhvm to 5% of users (duration: 00m 05s)

October 31

  • 20:46 logmsgbot: aaron Synchronized php-1.25wmf5/includes/GlobalFunctions.php: 721435c3a6c8f7c728d3fa8ec34abb0f2ef7543d (duration: 00m 07s)
  • 20:36 logmsgbot: aaron Synchronized php-1.25wmf6/includes/GlobalFunctions.php: 04c35b2ca42d7a186278882763eb853552d8441c (duration: 00m 04s)
  • 18:36 ejegg: disabled recurring globalcollect
  • 18:03 logmsgbot: maxsem Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/170358 (duration: 00m 04s)
  • 15:25 logmsgbot: demon Synchronized wmf-config/CirrusSearch-production.php: (no message) (duration: 00m 04s)
  • 14:59 logmsgbot: demon Synchronized php-1.25wmf6/extensions/CirrusSearch: (no message) (duration: 00m 04s)
  • 14:59 logmsgbot: demon Synchronized php-1.25wmf5/extensions/CirrusSearch: (no message) (duration: 00m 04s)
  • 14:56 _joe_: rotated logs on ocg1001, restarted both ocg and rsyslog
  • 14:23 akosiaris: update DNS/NTP settings, add codfw on nas1001-a,b
  • 13:27 manybubbles: reenable was uneventful. good news.
  • 13:25 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: reenable cirrus everywhere where it has been after the outage has passed (duration: 00m 03s)
  • 12:41 manybubbles: reenabled cirrus as betafeature - no spike in error logs
  • 12:41 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: reenable cirrus as betafeature everywhere (duration: 00m 05s)
  • 12:37 manybubbles: cirrus is working on test2wiki - we look to be recovered save for some loss of redundancy
  • 12:36 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: reenable cirrus on testwiki (duration: 00m 04s)
  • 12:32 logmsgbot: manybubbles Synchronized wmf-config/: Disable Cirrus accelerated regexes as we *think* they might be causing outages (duration: 00m 04s)
  • 12:31 manybubbles: restart of elasticsearch nodes got them back to responsive. Cluster isn't fully healed yet but we're better then we were. Still not sure how we got this way
  • 12:26 manybubbles: restarting all elasticsearch boxes in quick sequence. when I try restarting a frozen box another one freezes up (probably an evil request being retried on it after its buddy went down).
  • 11:46 manybubbles: heap dumps aren't happening. Even with the config to dump them on oom errors. Restarting Elasticsearch nodes to get us back to stable and going to have to investigate from another direction.
  • 11:30 manybubbles: restarting gmond on elasticsearch nodes so I can get a clearer picture of them
  • 11:24 logmsgbot: oblivian Synchronized wmf-config/InitialiseSettings.php: ES is down, long live lsearchd (duration: 00m 09s)
  • 10:52 godog: restarting elasticsearch on elastic1031, heap exhausted at 30G
  • 01:14 springle: db1040 dberror spam is https://gerrit.wikimedia.org/r/#/c/169964/ only jobrunners affected, annoying but not critical

October 30

  • 23:56 awight: update civicrm from 1f0dc2ce0ab84765c085cc0ee369a7a047c0d005 to f47ed6f7e55946388db1dde787ca458c27a57c5a
  • 23:08 logmsgbot: demon Synchronized php-1.25wmf6/extensions/CirrusSearch: (no message) (duration: 00m 04s)
  • 23:08 logmsgbot: demon Synchronized php-1.25wmf5/extensions/CirrusSearch: (no message) (duration: 00m 05s)
  • 19:02 cmjohnson: powering off elastic1009-1002 to replace ssds
  • 18:35 mutante: restarting nginx on toollabs webproxy
  • 18:35 manybubbles: unbanning elastic1006 now that it is proplery configured
  • 17:54 _joe_: syncronized downsizing to 5%
  • 17:54 logmsgbot: oblivian Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 06s)
  • 17:42 _joe_: rolling restarted hhvm appservers
  • 17:38 hashar: Zuul seems to be happy. Reverted my lame patch to send Cache-Control headers since we have a cache breaker it is not needed.
  • 17:21 bd808: 10.64.16.29 is db1040 in the s4 pool
  • 17:18 bd808: "Connection error: Unknown error (10.64.16.29)" 1052 in last 5m; 2877 in last 15m
  • 17:16 hashar: Upgrading Zuul to have the status page emit a Cache-Control header bug 72766 wmf-deploy-20141030-1..wmf-deploy-20141030-2
  • 17:11 bd808: Upgraded kibana to v3.1.1 again. Better testing now that logstash is working.
  • 17:01 bd808: Logs on logstash1003 showed "Failed to flush outgoing items <Errno::EBADF: Bad file descriptor - Bad file descriptor>" on shutdown. Maybe something not quite right about elasticsearch_http plugin?
  • 17:00 logmsgbot: awight Synchronized php-1.25wmf6/includes/specials/SpecialUpload.php: Parse 'upload_source_url' message on SpecialUpload (duration: 00m 10s)
  • 16:59 bd808: restarted logstash on logstash1003. No events logged since 00:00Z
  • 16:58 logmsgbot: awight Synchronized php-1.25wmf5/includes/specials/SpecialUpload.php: Parse 'upload_source_url' message on SpecialUpload (duration: 00m 11s)
  • 16:58 bd808: restarted logstash on logstash1002. No events logged since 00:00Z
  • 16:58 bd808: restarted logstash on logstash1001. No events logged since 00:00Z
  • 16:54 akosiaris: uploaded php5_5.3.10-1ubuntu3.15+wmf1 on apt.wikimedia.org
  • 16:46 bd808: Reverted kibana to e317bc6
  • 16:44 logmsgbot: oblivian Synchronized wmf-config/CommonSettings.php: Serving 15% of anons with HHVM (ludicrous speed!) (duration: 00m 16s)
  • 16:38 bd808: Upgraded kibana to v3.1.1 via Trebuchet
  • 16:38 hashar: Zuul status page is freezing because the status.json is being cached :-/
  • 16:31 logmsgbot: awight Synchronized php-1.25wmf6/extensions/CentralNotice: push CentralNotice updates (duration: 00m 09s)
  • 16:28 logmsgbot: awight Synchronized php-1.25wmf5/extensions/CentralNotice: push CentralNotice updates (duration: 00m 11s)
  • 16:22 manybubbles: moving shards off of elastic1003 and elastic1006 so they can be restarted. elastic1003 need hyperthreading and elastic1006 needs noatime.
  • 16:17 cmjohnson: powering off elastic1015-16 to replace ssds
  • 16:04 hashar: restarted Zuul with upgraded version ( wmf-deploy-20140924-1..wmf-deploy-20141030-1 )
  • 16:03 hashar: Stopping zuul
  • 16:00 logmsgbot: hoo Synchronized wmf-config/CommonSettings.php: Fix oauthadmin (duration: 00m 09s)
  • 15:43 hashar: Going to upgrade Zuul and monitor the result over the next hour.
  • 15:39 ottomata: starting to reimage mw1032
  • 15:29 logmsgbot: oblivian Synchronized wmf-config/CommonSettings.php: Serving 10% of anons with HHVM (duration: 00m 06s)
  • 15:22 logmsgbot: reedy Synchronized docroot and w: Fix dbtree caching (duration: 00m 15s)
  • 15:13 akosiaris: upgrading PHP on mw1113 to php5_5.3.10-1ubuntu3.15+wmf1
  • 15:07 manybubbles: moving shards off of elastic1015 and elastic1016 so we can replace their hard drives/turn on hyper threading
  • 15:07 logmsgbot: marktraceur Synchronized php-1.25wmf6/extensions/Wikidata/: [SWAT] [wmf6] Fix edit link for aliases (duration: 00m 12s)
  • 14:37 cmjohnson: powering down elastic1003-1006 to replace ssds
  • 14:33 _joe_: pooling mw1031/2 in the hhvm appservers pool
  • 12:51 _joe_: rebooting mw1030 and mw1031 to use the updated kernel
  • 12:48 akosiaris: enabled puppet on uranium
  • 11:38 _joe_: depooling mw1030 and mw1031 for reimaging as hhvm appservers
  • 10:15 _joe_: load test ended
  • 09:48 _joe_: load testing the hhvm appserver pool as well
  • 08:17 _joe_: powercycling mw1189, enabling hyperthreading
  • 08:04 _joe_: doing the same with mw1189, to see how different appserver generations respond
  • 07:25 _joe_: raising the weight of mw1114 in the api pool to test the throughput it can withstand
  • 04:47 ori: enabled heap profiling on mw1189

October 29

  • 23:42 ejegg: updated tool from 19928683a8112e9aadd71ba47f199885ba517a02 to 419fb7aa32c6d0776056968378e358ee01985565
  • 23:38 logmsgbot: maxsem Synchronized php-1.25wmf6/extensions/MobileFrontend/: (no message) (duration: 00m 07s)
  • 23:35 logmsgbot: maxsem Synchronized php-1.25wmf5/extensions/MobileFrontend/: (no message) (duration: 00m 04s)
  • 23:13 logmsgbot: catrope Synchronized php-1.25wmf6/extensions/VisualEditor: SWAT (duration: 00m 04s)
  • 23:00 mutante: restarting nginx on cp1044
  • 22:11 AaronSchulz: Re-running setZoneAccess.php for swift
  • 22:04 Krinkle: git-deploy: Deploying integration/slave-scripts a6a23ac1ec
  • 20:28 subbu: reverted parsoid to version 617e9e61b625f25d79dfaab08830c396537be632 (due to stuck processes)
  • 20:16 logmsgbot: reedy Synchronized wmf-config/mc-labs.php: noop for prod (duration: 00m 17s)
  • 20:07 arlolra: updated Parsoid to version e5bc6da6e347a65cedf24a2284e51af881dce599
  • 19:45 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 16s)
  • 19:39 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 16s)
  • 19:26 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 15s)
  • 19:17 ori: upgraded HHVM to 3.3.0+dfsg1-1+wm1
  • 18:58 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.25wmf6
  • 18:57 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias to 1.25wmf5
  • 18:47 logmsgbot: reedy Finished scap: testwiki to 1.25wmf6 and build l10n cache (duration: 28m 30s)
  • 18:18 logmsgbot: reedy Started scap: testwiki to 1.25wmf6 and build l10n cache
  • 17:24 cmjohnson: shutting down to replace ssds in elastic1002,1007,1014
  • 17:07 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: I8dd62e2cc: Re-enable hhvm beta feature on Wikidata (duration: 00m 06s)
  • 16:20 manybubbles: elastic101[7-9] look good to me - adding them to the cluster
  • 16:17 manybubbles: shutting down elasticsearch on elastic1002 - its empty and ready to have its disk upgraded/hyper threading enabled
  • 16:05 manybubbles: ignore my last log message about 1017 - typod
  • 16:05 manybubbles: shutting down elasticsearch on elastic1007 - its empty and ready to have its disk upgraded/hyper threading enabled
  • 16:04 manybubbles: shutting down elasticsearch on elastic1014 - its empty and ready to have its disk upgraded/hyper threading enabled
  • 16:03 manybubbles: shutting down elasticsearch on elastic1017 - its empty and ready to have its disk upgraded/hyper threading enabled
  • 15:39 manybubbles: start moving shards back to elastic1001 and elastic1008 now that they are up with hyperthreading on
  • 15:37 Reedy: deleted php-1.24wmf21 from mediawiki-installation
  • 15:36 Reedy: deleted php-1.24wmf20 from mediawiki-installation
  • 15:35 Reedy: deleted php-1.24wmf19 from mediawiki-installation
  • 15:35 akosiaris: uploaded apertium-apy_0.1+svn~57689-1 on apt.wikimedia.org
  • 15:23 manybubbles: unbanned elastic1013 now that it is back with hyper threading on
  • 15:21 logmsgbot: reedy Purged l10n cache for 1.25wmf3
  • 15:20 logmsgbot: reedy Purged l10n cache for 1.25wmf2
  • 15:19 bd808: Restarted logstash on logstash1002 to fix OCG and hadoop log events not being recorded
  • 15:15 bd808: Restarted logstash on logstash1001. No MW events were being added to the index.
  • 15:10 logmsgbot: anomie Synchronized wmf-config/throttle.php: SWAT: Raise account creation throttle at cawiki temporarily gerrit:169708 (duration: 00m 09s)
  • 15:07 logmsgbot: anomie Synchronized php-1.25wmf5/extensions/Wikidata: SWAT: Fix WikiData "add links" widget JS error gerrit:169700 (duration: 00m 15s)
  • 15:07 Reedy: Killed old (pre 1.25) l10nupdate cache dirs from tin:/var/lib/l10nupdate
  • 15:00 manybubbles: started moving shard off of elastic1001, elastic1008, and elastic1013 so we can bounce them to enable hyper threading
  • 14:55 manybubbles: started rolling shards back to elastic1001, elastic1008, and elastic1013 after hard drive upgrade
  • 14:21 ottomata: set request.required.acks = 2 for all varnishkafkas
  • 13:22 manybubbles: lowered replication on logstash's template for new indexes from 3 way to 2 way
  • 13:20 logmsgbot: demon Synchronized wmf-config/lucene-production.php: unbreak lsearchd for commons, enwikitionary, etc (duration: 00m 04s)
  • 13:11 manybubbles: lowered redundancy on logstash from 3 way to 2 way
  • 13:01 cmjohnson: powering down/replacing elastic1017 and elastic1018
  • 12:59 cmjohnson: disabling puppet on elastic1017 and 1018
  • 12:01 cmjohnson: elastic1001, elastic1008 and elastic1013 powering down to replace ssds RT7779
  • 12:01 springle: xtrabackup clone db1007 to db2029
  • 07:16 ori: repooled mw1189 w/patched hhvm (<https://phabricator.wikimedia.org/T820#16428>)
  • 03:39 ori: upgraded mw1114 to custom package with patch from https://phabricator.wikimedia.org/T820#16428 applied

October 28

  • 23:12 logmsgbot: demon Synchronized php-1.25wmf5/extensions/Wikidata: (no message) (duration: 00m 10s)
  • 23:05 _joe_: removed stale heap profile files from /run/hhvm on mw1114
  • 23:02 logmsgbot: demon Synchronized wmf-config/CommonSettings.php: extension distributor stuffs (duration: 00m 05s)
  • 22:09 ejegg: updated crm from ffa543cab3eb508fa38b94c6de2643d168b0d507 to 1f0dc2ce0ab84765c085cc0ee369a7a047c0d005
  • 21:05 awight: reverted payments, from 647d1eb7d8cccb73fabf5ffded9f713d24576c37 to e3d235f881282120409e1a6ed1a3908ce9a63c26
  • 21:02 hashar: Zuul back in action.
  • 20:54 hashar: Zuul deadlocked again. Restarting Gearman plugin on Jenkins
  • 20:53 awight: updated payments from 525988487d6bbd08ddad50badd88e34e34104292 to 647d1eb7d8cccb73fabf5ffded9f713d24576c37
  • 20:29 manybubbles: removed /etc/elasticsearch/*.dpkg-dist fromg logstash machines - that was breaking logging for some reason. magic.
  • 20:24 manybubbles: disabling puppet on logstash1003 and trying to run elasticserach by hand to learn more about why its borked.
  • 19:42 andrewbogott: deleting unused labs projects: commons-dev, echo, farsi-wikitest
  • 19:33 bd808: Elasticsearch not recovering indices at all on logstash1003 and no logging output
  • 19:11 logmsgbot: reedy Synchronized php-1.25wmf5/extensions/CirrusSearch/: (no message) (duration: 00m 15s)
  • 19:07 Reedy: frwikibooks collation updated
  • 19:06 Reedy: Running mwscript updateCollation.php --wiki=frwikibooks --previous-collation=uppercase
  • 19:05 logmsgbot: reedy Synchronized wmf-config/: All of the config changes! (duration: 00m 14s)
  • 18:53 logmsgbot: reedy Synchronized wmf-config/: Bump cache epoch for Wikidata (duration: 00m 14s)
  • 18:52 logmsgbot: reedy Finished scap: Split Cite extension, scap to build l10n cache for CiteThisPage (duration: 33m 52s)
  • 18:29 bd808|LUNCH: restarted elasticsearch node on logstash1003
  • 18:18 logmsgbot: reedy Started scap: Split Cite extension, scap to build l10n cache for CiteThisPage
  • 18:11 logmsgbot: reedy Synchronized wmf-config/: Set = true for Italian Wikipedia in November-December 2014 (duration: 00m 14s)
  • 18:05 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non wikipedias to 1.25wmf5
  • 17:27 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: I56082795: Modify $wmgAddWikiNotify for use by notifyNewProjects (duration: 00m 05s)
  • 16:59 bd808: restarted logstash on logstash1002 to try and get gelf input events into kibana again
  • 16:58 bd808: disk utilization on logstash100[123] greater than 80%
  • 16:57 cscott: no logs for ocg/parsoid on logstash since 2014-10-27T18:50:46.104Z/2014-10-27T18:50:45.977Z (respectively)
  • 16:57 logmsgbot: andrew Synchronized php-1.25wmf5/extensions/OpenStackManager: (no message) (duration: 00m 03s)
  • 16:56 bd808: No new logs in /var/log/elasticsearch for logstash100[123] since Sep 30 06:25
  • 16:55 logmsgbot: andrew Synchronized php-1.25wmf5/extensions/OpenStackManager: (no message) (duration: 00m 02s)
  • 16:44 logmsgbot: reedy Synchronized php-1.25wmf4/extensions/OpenStackManager: (no message) (duration: 00m 14s)
  • 15:47 andrewbogott: running sync-common on virt1000
  • 15:47 logmsgbot: manybubbles Synchronized php-1.25wmf4/extensions/OpenStackManager/: SWAT update openstackmanager (duration: 00m 04s)
  • 15:45 logmsgbot: manybubbles Synchronized php-1.25wmf5/extensions/OpenStackManager/: SWAT update openstackmanager (duration: 00m 04s)
  • 15:24 logmsgbot: manybubbles Synchronized wmf-config/: SWAT cirrus regex queues too small? (duration: 00m 05s)
  • 15:11 logmsgbot: manybubbles Synchronized php-1.25wmf5/extensions/Wikidata/: SWAT update wikidata (duration: 00m 10s)
  • 15:03 logmsgbot: manybubbles Synchronized wmf-config/: SWAT cirrus config updates - (hopefully) faster regexes (duration: 00m 06s)
  • 14:51 godog: rolling-restart of eqiad ms-fe* after https://gerrit.wikimedia.org/r/#/c/167310/
  • 14:04 godog: reload swift frontend in eqiad after password rotation
  • 14:04 logmsgbot: demon Synchronized wmf-config/PrivateSettings.php: (no message) (duration: 00m 04s)
  • 13:48 logmsgbot: manybubbles Synchronized php-1.25wmf5/extensions/CirrusSearch/: (no message) (duration: 00m 05s)
  • 13:47 logmsgbot: manybubbles Synchronized php-1.25wmf4/extensions/CirrusSearch/: (no message) (duration: 00m 11s)
  • 01:01 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: Turn Cirrus back on basically everywhere. If Elasticsearch freaks out again just revert I73ae276e to get back to lsearchd again (duration: 00m 04s)
  • 00:43 logmsgbot: ori Synchronized php-1.25wmf4/extensions/WikimediaEvents/WikimediaEventsHooks.php: I4adffaa26: Actually unset the HHVM cookie (duration: 00m 03s)
  • 00:43 logmsgbot: ori Synchronized php-1.25wmf5/extensions/WikimediaEvents/WikimediaEventsHooks.php: I4adffaa26: Actually unset the HHVM cookie (duration: 00m 03s)
  • 00:27 awight: reenabling recurring GlobalCollect job
  • 00:07 awight: updated crm from 9bb50403616d80aa8d39a89ab59965f53e9e3f3d to ffa543cab3eb508fa38b94c6de2643d168b0d507

October 27

  • 23:52 bd808: Restarted logstash service on logstash1001 because I was not seeing any events from MW make it into kibana
  • 23:27 logmsgbot: maxsem Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/169229/ for reals now (duration: 00m 04s)
  • 23:26 Reedy: restarted logstash on logstash1001
  • 23:23 logmsgbot: maxsem Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/169229/ (duration: 00m 04s)
  • 23:22 Tim: on mw1114: disabled puppet, enabled Eval.PerfPidMap, restarted hhvm
  • 23:21 awight: updated crm from 5b395c37dc596736ecafceeb156221e3751bfe37 to 9bb50403616d80aa8d39a89ab59965f53e9e3f3d
  • 23:21 awight: disabling recurring globalcollect job
  • 23:20 logmsgbot: maxsem Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/168771/ (duration: 00m 04s)
  • 23:17 logmsgbot: maxsem Synchronized php-1.25wmf4/extensions/VisualEditor/: (no message) (duration: 00m 04s)
  • 23:16 logmsgbot: maxsem Synchronized php-1.25wmf4/extensions/MobileFrontend/: (no message) (duration: 00m 05s)
  • 23:14 logmsgbot: maxsem Synchronized php-1.25wmf5/extensions/MobileFrontend/: (no message) (duration: 00m 04s)
  • 23:14 logmsgbot: maxsem Synchronized php-1.25wmf5/extensions/VisualEditor/: (no message) (duration: 00m 05s)
  • 23:14 logmsgbot: maxsem Synchronized php-1.25wmf5/extensions/Wikidata/: (no message) (duration: 00m 10s)
  • 23:13 logmsgbot: maxsem Synchronized php-1.25wmf3/extensions/Wikidata/: (no message) (duration: 00m 12s)
  • 23:06 logmsgbot: maxsem Synchronized wmf-config/Wikibase.php: https://gerrit.wikimedia.org/r/#/c/169192/ (duration: 00m 04s)
  • 22:58 awight: reenabling recurring globalcollect job
  • 22:54 awight: rollback civicrm from 9bb50403616d80aa8d39a89ab59965f53e9e3f3d to 5b395c37dc596736ecafceeb156221e3751bfe37
  • 22:53 awight: updated civicrm from 5b395c37dc596736ecafceeb156221e3751bfe37 to 9bb50403616d80aa8d39a89ab59965f53e9e3f3d
  • 22:50 logmsgbot: aaron Synchronized wmf-config/flaggedrevs.php: Removed $wgFlaggedRevsProtectQuota for enwiki (duration: 00m 03s)
  • 22:46 awight: disabling recurring GlobalCollect job
  • 22:45 Tim: activated heap profiling on mw1114
  • 22:21 AaronSchulz: Running cleanupBlocks.php on all wikis
  • 22:18 logmsgbot: aaron Synchronized php-1.25wmf4/maintenance: 64fe61e0dbfea84d2bab4c17bf01f5dfdf5cc3b5 (duration: 00m 04s)
  • 22:12 logmsgbot: aaron Synchronized wmf-config/CommonSettings.php: Stop GWT wgJobBackoffThrottling values from getting lost (duration: 00m 03s)
  • 20:35 subbu: deployed parsoid sha 617e9e61
  • 20:26 cscott: updated OCG to version 60b15d9985f881aadaa5fdf7c945298c3d7ebeac
  • 20:10 logmsgbot: maxsem Synchronized php-1.25wmf4/extensions/GeoData: GeoData back to normal (duration: 00m 03s)
  • 19:39 manybubbles: after restarting elasticsearch we expected to get memory errors again. no such luck so far....
  • 18:57 manybubbles: completed restarting elasticsearch cluster. now it'll make a useful file on out of memory errors. raised the recovery throttling so it'll recover fast enough to cause oom errors
  • 18:47 logmsgbot: maxsem Synchronized php-1.25wmf4/extensions/GeoData: live hack to disable geosearch (duration: 00m 04s)
  • 18:37 manybubbles: note that this is a restart without waiting for the cluster to go green after each restart. I expect lots of whining from icinga. This will cause us to lose some updates but should otherwise be safe.
  • 18:34 manybubbles: restarting elasticsearch servers to pick up new gc logging and to reset them into a "working" state so they can have their gc problem again and we can log it properly this time.
  • 18:15 logmsgbot: aaron Synchronized wmf-config/CommonSettings.php: Remove obsolete flags (all of them) from $wgAntiLockFlags (duration: 00m 07s)
  • 17:53 cmjohnson: replacing disk /dev/sdl slot 11 ms-be1013
  • 17:37 _joe_: uploaded a version of jemalloc for trusty with --enable-prof
  • 16:31 ^d: elasticsearch: temporarily raised node_concurrent_recoveries from 3 to 5.
  • 15:32 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: Enable Cirrus as secondary everywhere, brings back GeoData (duration: 00m 04s)
  • 15:08 manybubbles: Its unclear how much of the master going haywire is something that'll be fixed in elasticsearch 1.4. They've done a lot of work there on the cluster state communication.
  • 15:07 manybubbles: for posterity 10/18 of the elasticsearch servers had got the point where they couldn't free any heap. Its currently not clear to me why they did that. This caused the cluster to basically collapse. The master node kept beind unable to communicate with anyone because everyone was pausing for multiple minutes between replies. The cluster handshaking couldn't cope with that and promptly got itself into a state where nodes were both part of the cluster and not part of the cluster at the same time. Thats bad.
  • 15:03 manybubbles: restarting gmond on all elasticsearch systems because stats aren't updating properly in ganglia and usually that helps
  • 15:02 manybubbles: restarted a bunch of the elasticsearch nodes that had their heap full. wasn't able to get a heap dump on any of them because they all froze while trying to get the heap dump.
  • 14:32 ^d: elasticsearch: disabling replica allocation, less things moving about if we restart cluster
  • 13:47 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: fall back to lsearchd for a bit (duration: 00m 05s)
  • 13:41 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 05s)
  • 13:29 manybubbles: restarted elasticsearch on elastic1017 - memory was totally full there
  • 13:21 manybubbles: elastic1008 is logging gc issues. restarting it because that might help it
  • 05:04 springle: forced logrotate ocg1001
  • 03:36 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Oct 27 03:36:39 UTC 2014 (duration 36m 38s)
  • 02:27 logmsgbot: LocalisationUpdate completed (1.25wmf5) at 2014-10-27 02:27:45+00:00
  • 02:17 logmsgbot: LocalisationUpdate completed (1.25wmf4) at 2014-10-27 02:17:08+00:00

October 26

  • 23:46 Krinkle: Force restarted Zuul
  • 15:14 Krinkle: Jenkins/Zuul is stuck as of 20 hours ago
  • 15:06 _joe_: restarted hhvm on mw1114, memory nearly exhausted
  • 03:36 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Oct 26 03:36:20 UTC 2014 (duration 36m 19s)
  • 02:25 logmsgbot: LocalisationUpdate completed (1.25wmf5) at 2014-10-26 02:25:47+00:00
  • 02:15 logmsgbot: LocalisationUpdate completed (1.25wmf4) at 2014-10-26 02:15:12+00:00

October 25

  • 22:49 paravoid: upgrading JunOS on cr1-ulsfo
  • 22:32 paravoid: scheduling downtime for all ulsfo -lb- & cr1/2-ulsfo
  • 21:30 logmsgbot: ori Synchronized php-1.25wmf5/extensions/CentralNotice/CentralNotice.hooks.php: Iee2072ac7: Make sure we declare globals before using them (duration: 00m 06s)
  • 21:30 logmsgbot: ori Synchronized php-1.25wmf4/extensions/CentralNotice/CentralNotice.hooks.php: Iee2072ac7: Make sure we declare globals before using them (duration: 00m 06s)
  • 20:41 bd808: updated logstash-* labs instances to salt minion 2014.1.11 (thanks for the ping apergos)
  • 14:43 apergos: all active labs instances now running salt minion 2014.1.11 except for: logstash-* (have their own master), fabapi (pingable, can't ssh on), upload-wizard (running oneiric, not setting up a repo for that!). instances shutoff or w/ nova error were left untouched
  • 03:46 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Oct 25 03:46:48 UTC 2014 (duration 46m 47s)
  • 02:29 logmsgbot: LocalisationUpdate completed (1.25wmf5) at 2014-10-25 02:29:29+00:00
  • 02:18 logmsgbot: LocalisationUpdate completed (1.25wmf4) at 2014-10-25 02:18:14+00:00
  • 00:27 awight: updated DjangoBannerStats from cf5a875d49f4c4cf229d7f864a73d4c2f588ebf9 to a3038f133d64c737d3987bd1c37a987fd3003dd6

October 24

  • 22:40 akosiaris: puppet disabled on uranium, do not enable
  • 20:52 andrewbogott: revived virt1006 on a probationary basis. It's running compute but is disabled so new instances won't be scheduled there. I've moved a few test instances there to see how it behaves.
  • 20:36 logmsgbot: andrew Synchronized wmf-config/wikitech.php: (no message) (duration: 00m 04s)
  • 20:28 Reedy: sync-common on mw1088
  • 20:23 mutante: mw1088 - gzipping core dump files, disabled core dumps, restarted apache
  • 20:15 mutante: mw1088 - gzip other_vhosts_access.log.1 - Avail. 38G
  • 20:15 Reedy: / full on mw1088 due to apache core dumps
  • 20:09 Reedy: running sync-common on mw1041
  • 20:04 mutante: powercycled mw1041
  • 20:03 logmsgbot: reedy Synchronized php-1.25wmf5/extensions/SemanticForms/: noop for prod (duration: 00m 17s)
  • 20:01 Reedy: mw1041 is down
  • 20:01 Reedy: mw1088 has a full /
  • 20:00 logmsgbot: reedy Synchronized php-1.25wmf4/extensions/SemanticForms/: noop for prod (duration: 00m 16s)
  • 19:53 bblack: nickel's basically dead, uranium has been promoted to prod ganglia a little early for now
  • 19:22 awight: updated payments from 6fa864d4aaa22b9f271de4bc662be68bb0b40b56 to 525988487d6bbd08ddad50badd88e34e34104292
  • 18:55 ori: repooled mw1189 to do heap profiling on production api workload.
  • 17:58 mutante: stat1001 - Duplicate declaration: Package[nodejs]
  • 17:07 cmjohnson: getting ready to replace a failed disk on ganglia (server:nickel)...it will be offline for a few minutes
  • 17:05 ejegg: updated dash from 58fda9403dd33e4d47238f119b6bb2b2905856b1 to 69c9330d6983873ffa9bb87fcd783be03382bdfc
  • 15:50 awight: campaigns reenabled
  • 15:40 awight: payments db migrated to 1.23 schema
  • 15:37 awight: updated payments to REL1_23, 6fa864d4aaa22b9f271de4bc662be68bb0b40b56
  • 15:18 awight: payments in maintenance mode
  • 14:57 robh: francium going offline, ignore any icinga warning
  • 14:37 andrewbogott: running sync-common on virt1000
  • 14:36 logmsgbot: andrew Synchronized wmf-config/wikitech.php: (no message) (duration: 00m 03s)
  • 14:35 logmsgbot: andrew Synchronized wmf-config/wikitech.php: (no message) (duration: 00m 02s)
  • 13:39 akosiaris: disabled puppet on uranium. Testing ganglia with SSDs
  • 12:10 akosiaris: restarted gmetad on nickel, it was not responding on port 8654
  • 05:04 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Oct 24 05:04:33 UTC 2014 (duration 4m 32s)
  • 03:22 logmsgbot: LocalisationUpdate completed (1.25wmf5) at 2014-10-24 03:22:36+00:00
  • 02:46 logmsgbot: LocalisationUpdate completed (1.25wmf4) at 2014-10-24 02:46:15+00:00

October 23

  • 23:39 logmsgbot: catrope Synchronized php-1.25wmf4/extensions/CentralAuth/: SWAT (duration: 00m 04s)
  • 23:38 logmsgbot: catrope Synchronized php-1.25wmf4/extensions/AntiSpoof/: SWAT (duration: 00m 06s)
  • 23:29 logmsgbot: catrope Synchronized php-1.25wmf5/extensions/TimedMediaHandler/: SWAT (duration: 00m 04s)
  • 23:29 logmsgbot: catrope Synchronized php-1.25wmf5/extensions/UploadWizard/: SWAT (duration: 00m 06s)
  • 23:28 logmsgbot: catrope Synchronized php-1.25wmf4/extensions/UploadWizard/: SWAT (duration: 00m 04s)
  • 23:28 logmsgbot: catrope Synchronized php-1.25wmf4/extensions/TimedMediaHandler/: SWAT (duration: 00m 04s)
  • 23:27 logmsgbot: catrope Synchronized php-1.25wmf4/includes/api/ApiFormatBase.php: SWAT (duration: 00m 04s)
  • 22:42 hashar: Jenkins is all good now.
  • 22:36 hashar: Jenkins restarting
  • 22:28 hashar: preparing jenkins for restart
  • 21:54 hashar: Jenkins the Gearman plugin is holding a lock on deployment-bastion slave that prevents it from running any job :-/
  • 21:51 ejegg: updated civicrm from ad3386cd0f9b776e2fded7c4e6b1195e05ed669c to 937df4dacae0dd620ae9e8fed13566d51c1b18a4
  • 21:46 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1018 (duration: 00m 06s)
  • 21:43 hashar: Jenkins: disabling / reenabling Gearman plugin
  • 21:35 hashar: Jenkins: disconnected / reconnected slave node deployment-bastion.eqiad
  • 21:15 awight: updated crm from 03b15f7dad58ce61894d632e8fbebd2ae76ae4d0 to ad3386cd0f9b776e2fded7c4e6b1195e05ed669c
  • 21:04 ejegg: updated civicrm from 0a3ab0f18ce726898d14adcbe6ab08411c9e3e82 to 03b15f7dad58ce61894d632e8fbebd2ae76ae4d0
  • 19:29 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.25wmf5
  • 19:26 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias to 1.25wmf4
  • 19:19 logmsgbot: reedy Finished scap: testwiki to 1.25wmf5 (duration: 32m 55s)
  • 18:46 logmsgbot: reedy Started scap: testwiki to 1.25wmf5
  • 18:34 awight: updated crm from d6a75b6df4482de61da372fa653902db7ca12766 to 0a3ab0f18ce726898d14adcbe6ab08411c9e3e82
  • 18:19 logmsgbot: tgr Synchronized wmf-config/InitialiseSettings.php: Disable ImageMetrics on non-public wikis (duration: 00m 05s)
  • 18:03 logmsgbot: tgr Synchronized wmf-config/InitialiseSettings.php: Enable ImageMetrics on all wikis (duration: 00m 05s)
  • 17:14 logmsgbot: tgr Synchronized wmf-config/InitialiseSettings.php: Enable ImageMetrics on group0 (duration: 00m 05s)
  • 17:08 logmsgbot: tgr Finished scap: Deploying ImageMetrics extension (duration: 32m 04s)
  • 16:54 paravoid: cr2-ulsfo: upgrading junos again
  • 16:36 logmsgbot: tgr Started scap: Deploying ImageMetrics extension
  • 15:46 paravoid: preparing to upgrade JunOS on cr2-ulsfo
  • 15:31 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: db1065 to normal load (duration: 00m 08s)
  • 15:19 logmsgbot: anomie Synchronized php-1.25wmf4/api.php: SWAT: Include ApiMain construction in api.php try-catch block gerrit:168296 (duration: 00m 09s)
  • 15:19 logmsgbot: anomie Synchronized php-1.25wmf4/includes/api/ApiMain.php: SWAT: Include ApiMain construction in api.php try-catch block gerrit:168128 (duration: 00m 09s)
  • 15:11 logmsgbot: anomie Synchronized php-1.25wmf4/includes/api/ApiFormatFeedWrapper.php: SWAT: Fix ApiFormatFeedWrapper gerrit:168128 (duration: 00m 09s)
  • 14:25 ottomata: varnishkafka request.required.acks is now 2 for text, mobile, and bits.
  • 12:46 hashar: killed left over java/jenkins process on gallium
  • 12:09 Krinkle: Zuul/Jenkins stuck. Tried various gearman/zuul resets. Restarting Jenkins now.
  • 07:25 _joe_: restarted hhvm on mw1114, depooled the server
  • 05:39 Tim: on mw1189 testing some URLs at a high rate, attempting to induce measurable memory leak
  • 05:06 Tim: reverted unexplained uncomitted modification of palladium:/srv/pybal-config/pybal/eqiad/api which repooled mw1189
  • 03:41 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Oct 23 03:41:49 UTC 2014 (duration 41m 48s)
  • 03:06 logmsgbot: cscott Synchronized wmf-config/filebackend.php: fix using a file from commons with file name length between 140 and 159 (duration: 00m 20s)
  • 03:01 Krinkle: git-deploy: Deploying integration/slave-scripts 157ef23
  • 02:31 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1065, warm up (duration: 00m 06s)
  • 02:27 logmsgbot: LocalisationUpdate completed (1.25wmf4) at 2014-10-23 02:27:48+00:00
  • 02:15 logmsgbot: LocalisationUpdate completed (1.25wmf3) at 2014-10-23 02:15:22+00:00
  • 02:10 springle: removed old /var/log/ocg.log* on ocg1002, forced a logrotate
  • 02:06 springle: upgrade reboot db1065

October 22

  • 23:25 logmsgbot: demon Synchronized wmf-config/CommonSettings-labs.php: (no message) (duration: 00m 04s)
  • 23:25 logmsgbot: demon Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 04s)
  • 23:22 logmsgbot: demon Synchronized php-1.25wmf4/extensions/Collection: (no message) (duration: 00m 04s)
  • 23:21 logmsgbot: demon Synchronized php-1.25wmf3/extensions/Collection: (no message) (duration: 00m 04s)
  • 23:08 mutante: added jhernandez to wmf LDAP group
  • 22:54 bd808: forced puppet run on logstash1003 to pick up https://gerrit.wikimedia.org/r/#/c/168199/
  • 22:47 bd808: forced puppet run on logstash1002 to pick up https://gerrit.wikimedia.org/r/#/c/168199/
  • 22:42 bd808: forced puppet run on logstash1001 to pick up https://gerrit.wikimedia.org/r/#/c/168199/
  • 22:21 bblack: depooled amssq42 (esams text) for trusty testing
  • 21:38 bd808: killed duplicate logstash services running on logstash1001 and restarted
  • 21:16 arlolra: updated OCG to version e977e2c8ecacea2b4dee837933cc2ffdc6b214cb
  • 21:06 bd808: forced puppet run on logstash1001 to pick up https://gerrit.wikimedia.org/r/#/c/168182/
  • 20:40 bd808: forced puppet run on logstash1001 to pick up https://gerrit.wikimedia.org/r/168089
  • 20:37 mutante: powercycling unresponsive ms-be1012 (this happened before, search SAL for hostname)
  • 20:25 arlolra: updated Parsoid to version 2a8dc85ce676391acd8c6255c4f94250612c9ee2
  • 16:59 andrewbogott: reinstalling OS on virt1006
  • 15:57 logmsgbot: marktraceur Synchronized wmf-config/InitialiseSettings.php: [SWAT] Send collection logs to logstash. (duration: 00m 05s)
  • 15:31 ottomata: setting request.required.acks to 2 for mobile and text varnishkafka's (mobile was set to 2 yesterday)
  • 15:23 _joe_: rolling restart of hhvm appservers, to alleviate memory issues
  • 15:09 logmsgbot: marktraceur Synchronized wmf-config/InitialiseSettings.php: [SWAT] Pre-render thumbnails on upload on Commons (duration: 00m 05s)
  • 15:05 logmsgbot: marktraceur Synchronized wmf-config/CommonSettings-labs.php: [SWAT] Re-enable PediaPress POD in production. (duration: 00m 05s)
  • 15:05 logmsgbot: marktraceur Synchronized wmf-config/CommonSettings.php: [SWAT] Re-enable PediaPress POD in production. (duration: 00m 05s)
  • 13:38 godog: catch-up swiftrepl sync eqiad -> codfw for commons containers
  • 09:41 hashar: Zuul/Jenkins in a deadlock
  • 09:08 hashar: Restarting Jenkins
  • 09:07 hashar: Jenkins: upgrading gearman-plugin from 0.0.7-1-g3811bb8 to 0.1.0-1-gfa5f083 . Ie bring us to latest version + 1 commit
  • 03:54 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Oct 22 03:54:53 UTC 2014 (duration 54m 52s)
  • 02:30 logmsgbot: LocalisationUpdate completed (1.25wmf4) at 2014-10-22 02:30:07+00:00
  • 02:17 logmsgbot: LocalisationUpdate completed (1.25wmf3) at 2014-10-22 02:17:49+00:00

October 21

  • 23:21 logmsgbot: maxsem Synchronized php-1.25wmf4/includes/PrefixSearch.php: https://gerrit.wikimedia.org/r/#/c/167982/ (duration: 00m 03s)
  • 23:09 logmsgbot: maxsem Synchronized php-1.25wmf3/extensions/MobileFrontend/: SWAT (duration: 00m 04s)
  • 23:08 logmsgbot: maxsem Synchronized php-1.25wmf4/extensions/MobileFrontend/: SWAT (duration: 00m 04s)
  • 23:07 logmsgbot: maxsem Synchronized php-1.25wmf4/extensions/CentralAuth/: SWAT (duration: 00m 04s)
  • 22:47 mutante: radium - installed OS, signing puppet cert requests, initial run ...
  • 22:42 legoktm: manually finished global rename for BonumTV --> Karypal which failed due to page move timeout
  • 22:13 andrewbogott: deleted unused labs projects: versionview, feeds, datadog, fundraising-awight, simplewiki, mediawiki-custom-de, fundraising, sartoris, wikibits, incubator, wikiversity-sandbox, data4all
  • 21:33 K4-7131: adjusting payments antifraud filters
  • 21:29 logmsgbot: ebernhardson Synchronized php-1.25wmf4/extensions/LiquidThreads/api/ApiQueryLQTThreads.php: Bump LQT in 1.25wmf4 (duration: 00m 04s)
  • 20:41 cscott: updated OCG to version 523c8123cd826c75240837c42aff6301032d8ff1
  • 20:40 logmsgbot: hoo Synchronized wmf-config/CommonSettings.php: Set extendwatchlist = 0 (duration: 00m 08s)
  • 20:37 hashar: lanthanum /var/lib/jenkins-slave/tmpfs went full again. cleared up a bunch of files
  • 20:22 _joe_: installing new hhvm packages on the depooled server mw1189, for debugging
  • 20:14 AaronSchulz: Deployed deployment/jobrunner to d426235e10edc682b532e7b4f2b02bb9414661ba
  • 19:47 logmsgbot: hoo Synchronized wmf-config/InitialiseSettings.php: Add import sources for orwikisource (duration: 00m 08s)
  • 19:12 qchris: Added its-phabricator plugin (d425a5ded909ee73df53d5e6d91d28014d0be375) into gerrit
  • 18:40 chasemp: puppet on gerrit which restarted service
  • 18:25 logmsgbot: reedy Synchronized php-1.25wmf3/extensions/Collection/: (no message) (duration: 00m 13s)
  • 18:24 logmsgbot: reedy Synchronized php-1.25wmf4/extensions/Collection/: (no message) (duration: 00m 14s)
  • 18:21 logmsgbot: reedy Synchronized database lists: (no message) (duration: 00m 26s)
  • 18:20 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 18s)
  • 18:07 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non wikipedias to 1.25wmf4
  • 16:54 _joe_: stopping puppet on mw1114 in order to do some jemalloc debugging
  • 16:40 K4-713: adjusted payments fraud filters for WP test
  • 14:45 godog: start catch up swiftrepl on ms-fe1003 for 'notcommons' containers
  • 14:28 ottomata: set vm.dirty_writeback_centisecs = 200 (was 500) on analytics1021
  • 14:23 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1065 (duration: 00m 06s)
  • 11:56 godog: silenced *-lb.ulsfo.wikimedia.org
  • 11:43 godog: drained ulsfo via DNS, GTT link problems
  • 11:09 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1066, warm up (duration: 00m 06s)
  • 09:09 akosiaris: enabled icinga-wm again
  • 08:50 akosiaris: temporarily killed icinga-wm
  • 04:21 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Oct 21 04:21:19 UTC 2014 (duration 21m 18s)
  • 03:06 logmsgbot: LocalisationUpdate completed (1.25wmf4) at 2014-10-21 03:06:47+00:00
  • 02:33 logmsgbot: LocalisationUpdate completed (1.25wmf3) at 2014-10-21 02:33:35+00:00
  • 02:26 mutante: rebooting iron for upgrades
  • 02:12 mutante: restarting Apache on strontium
  • 01:58 mutante: restarting apache on puppetmaster, temp. stopping icinga-wm
  • 01:31 K4-713: adjusted fraud filters on payments
  • 00:32 logmsgbot: ori Synchronized wmf-config: Id01fe7aac: Turn off spammy message cache log (duration: 00m 05s)

October 20

  • 23:56 awight: updated payments to 4cf8eb06a4746478c6424648c94688bf460cf63d
  • 23:31 springle: upgrade db1004 trusty and reboot
  • 23:20 mutante: installing package upgrades on iron
  • 23:03 logmsgbot: demon Synchronized wmf-config/CommonSettings-labs.php: no-op, for completeness (duration: 00m 05s)
  • 22:38 awight|purgreen: updating payments config with French Snowflake hack
  • 22:37 Tim: doing load testing on mw1189
  • 22:28 Tim: enabled puppet on mw1189
  • 21:08 cscott: redis-cli srem "deploy:ocg/ocg:minions" tantalum.eqiad.wmnet
  • 21:08 bblack: baham running gdnsd-2.1.0 test pkg
  • 21:08 bd808: Deployed iegreview 203d509 (Disable strict variables for twig)
  • 21:01 cscott: updated OCG to version ea10c93aca9bc1cae34f284fd74bb05d4b6a8cc6
  • 20:33 jgage: ulsfo repooled in dns
  • 20:17 jgage: ulsfo fpc 1 mic 1 card swap complete
  • 20:11 jgage: beginning router card swap in ulsfo
  • 20:11 subbu: deployed Parsoid version d4567e9f
  • 20:01 paravoid: cr1-ulsfo: reenable ospf/ospf3 (GTT is stable)
  • 19:13 mutante: importing schema, data, users into mysql for iegreview
  • 17:41 apergos: all trusty and lucid hosts now running salt 2014.1.11 (this includes labcontrol2001, salt master for future codfw labs)
  • 16:48 logmsgbot: reedy Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 14s)
  • 16:47 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: orwikisource
  • 16:44 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 15s)
  • 16:44 logmsgbot: reedy Synchronized database lists: orwikisource (duration: 00m 13s)
  • 16:43 cmjohnson: reseating pem3 cr2-eqiad
  • 16:43 apergos: all precise hosts salt updated to 2014.1.11, this includes tin (deployment) and virt1000 (salt master for labs). Not updated: virt1006 (inaccessible)
  • 15:58 logmsgbot: anomie Synchronized wmf-config: SWAT: Enable wgSecurePollUseNamespace for testwiki gerrit:167592 (duration: 00m 10s)
  • 15:57 logmsgbot: anomie Finished scap: (no message) (duration: 18m 22s)
  • 15:39 logmsgbot: anomie Started scap: (no message)
  • 15:38 logmsgbot: anomie Synchronized php-1.25wmf4/extensions/SecurePoll/: SWAT: Update SecurePoll for testing on testwiki gerrit:167586 (duration: 00m 10s)
  • 15:20 _joe_: disabling puppet on mw1189 to do some hhvm testing
  • 15:20 logmsgbot: anomie Synchronized php-1.25wmf3/resources/lib/oojs-ui/: SWAT: OOJS-UI bug fixes gerrit:167344 (duration: 00m 12s)
  • 15:19 logmsgbot: anomie Synchronized php-1.25wmf3/extensions/VisualEditor/: SWAT: VE bug fixes gerrit:167344 (duration: 00m 10s)
  • 15:08 logmsgbot: anomie Synchronized php-1.25wmf4/extensions/VisualEditor/: SWAT: VE bug fixes gerrit:167577 (duration: 00m 10s)
  • 14:26 paravoid: cr1-ulsfo: deactivating ospf/ospf3 on GTT ulsfo-eqiad link
  • 13:26 paravoid: cr2-ulsfo: "request chassis mic {off,on}line fpc-slot 1 mic-slot 1" to reboot broken card
  • 12:28 apergos: upgraded salt master (plus minion) on palladium to 2014.1.11, all neww precise installs will get this version now, other minion upgrades to follow shortly
  • 11:51 godog: temporarily stopped ircecho/icinga-wm on neon, shower of alarms
  • 11:42 godog: killed stray/old copy of diamond that was filling up conntrack on virt1000
  • 09:53 akosiaris: restarted ocg on ocg1001, ocg1002, ocg1003
  • 07:07 _joe_: rolling restart of ocg services
  • 04:29 springle: removed old /var/log/ocg* on ocg1001 and ocg1003 and forced logrotate, / space critical
  • 03:42 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Oct 20 03:42:24 UTC 2014 (duration 42m 23s)
  • 02:50 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1042 (duration: 00m 06s)
  • 02:34 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1066 (duration: 00m 06s)
  • 02:28 logmsgbot: LocalisationUpdate completed (1.25wmf4) at 2014-10-20 02:28:35+00:00
  • 02:16 logmsgbot: LocalisationUpdate completed (1.25wmf3) at 2014-10-20 02:16:31+00:00

October 19

  • 03:36 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Oct 19 03:36:03 UTC 2014 (duration 36m 2s)
  • 02:25 logmsgbot: LocalisationUpdate completed (1.25wmf4) at 2014-10-19 02:25:44+00:00
  • 02:14 logmsgbot: LocalisationUpdate completed (1.25wmf3) at 2014-10-19 02:14:23+00:00

October 18

  • 15:09 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1061, warm up (duration: 00m 06s)
  • 13:49 _joe_: restarted apache on the puppetmasters
  • 03:51 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Oct 18 03:51:18 UTC 2014 (duration 51m 17s)
  • 02:33 logmsgbot: LocalisationUpdate completed (1.25wmf4) at 2014-10-18 02:33:01+00:00
  • 02:20 logmsgbot: LocalisationUpdate completed (1.25wmf3) at 2014-10-18 02:20:20+00:00

October 17

  • 23:46 bd808: Ran trebuchet to create initial tag for iegreview/iegreview
  • 23:35 ori: pooled mw1114 (hhvm api server) to test whether new package resolves overload behavior
  • 23:28 ori: updating hhvm app servers to 3.3.0-20140925+wmf3
  • 22:18 mutante: graceful Apache on antimony - svn fixed, gitblit behind varnish
  • 22:09 mutante: graceful Apache on neon - icinga and tendril done, ishmael = misc-web
  • 21:46 mutante: graceful Apache on stat1001
  • 21:25 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool es1004, es1007, es1010 (duration: 00m 07s)
  • 21:19 mutante: graceful Apache on netmon1001
  • 21:13 mutante: graceful Apache on puppetmaster
  • 20:34 andrewbogott: removed some stray .zip files from /tmp on ocg1002
  • 20:16 ori: disabled puppet on osmium to debug hhvm
  • 15:36 paravoid: killed tampa config remnants on all cr1/cr2s
  • 15:30 bblack: restarted puppetmasters
  • 14:44 _joe_: load test on hhvm done
  • 14:26 _joe_: load test on the hhvm cluster
  • 10:41 akosiaris: uploaded apertium 3.3 on apt.wikimedia.org (trusty-wikimedia)
  • 09:17 _joe_: manually killed long-running stuck processes on ocg1001, moving to the rest of the cluster
  • 08:56 _joe_: restarted the ocg cluster
  • 08:44 akosiaris: uploaded cg3 and lttoolbox on apt.wikimedia.org
  • 05:37 _joe_: depooling both hhvm api appservers
  • 04:52 springle: xtrabackup clone es1010 to es2008
  • 04:51 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Oct 17 04:51:02 UTC 2014 (duration 51m 1s)
  • 04:17 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: reduce load on db1059 (duration: 00m 21s)
  • 04:05 springle: upgrade es1010 to trusty (clone failed, needs trusty)
  • 03:54 springle: xtrabackup clone es1010 to es2008
  • 03:53 springle: xtrabackup clone es1007 to es2006
  • 03:37 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool es1007 and es1010 (duration: 00m 09s)
  • 03:13 logmsgbot: LocalisationUpdate completed (1.25wmf4) at 2014-10-17 03:13:33+00:00
  • 02:39 logmsgbot: LocalisationUpdate completed (1.25wmf3) at 2014-10-17 02:39:26+00:00
  • 01:16 mutante: powering up server formerly known as cp1001
  • 00:47 logmsgbot: hoo Synchronized php-1.25wmf4/extensions/Wikidata/: Fix ORMTable usage, IE 11 freeze bug and adopt to further core changes (duration: 00m 14s)

October 16

  • 23:07 mutante: RT - removed global permission for privileged users to create tickets - should not affect anyone because users are either not privileged or get this from other groups - need it to be flexible about readonly queues in RT - let me know if any issues
  • 22:15 K4-713: undo last payments settings change
  • 22:09 K4-713: payments settings updates
  • 20:15 logmsgbot: reedy Synchronized database lists: (no message) (duration: 00m 18s)
  • 20:15 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 17s)
  • 20:10 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.24wmf4
  • 20:07 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikipedias to 1.25wmf3
  • 20:06 logmsgbot: reedy Synchronized database lists: (no message) (duration: 00m 19s)
  • 20:06 logmsgbot: reedy Finished scap: testwiki to 1.25wmf4 (duration: 46m 21s)
  • 19:36 K4-713: payments localsettings updates - supported countries and fraud filter settings
  • 19:19 logmsgbot: reedy Started scap: testwiki to 1.25wmf4
  • 19:09 K4-713: updated payments to 14c415fcfc3cade9a1
  • 18:22 ottomata: restarted varnishkafka on cp3021
  • 17:47 logmsgbot: ejegg Synchronized php-1.25wmf2/extensions/CentralNotice/: Update CentralNotice (duration: 00m 04s)
  • 17:29 logmsgbot: ejegg Synchronized php-1.25wmf3/extensions/CentralNotice/: Update CentralNotice (duration: 00m 08s)
  • 17:29 bblack: depooled mw1114 for api
  • 15:31 logmsgbot: marktraceur Synchronized wmf-config/Wikibase.php: [SWAT] Define client CSS classes for new wikidata badges (duration: 00m 05s)
  • 15:30 marktraceur: Sorry, that was in fact adding NS_PROPERTY to the search configuration, mistyped.
  • 15:29 logmsgbot: marktraceur Synchronized wmf-config/InitialiseSettings.php: [SWAT] Enable wgCopyUploadsFromSpecialUpload on testwiki (duration: 00m 05s)
  • 15:26 logmsgbot: marktraceur Synchronized php-1.25wmf3/extensions/Wikidata/: [SWAT] [wmf3] Update CSS for Wikidata badges (duration: 00m 11s)
  • 15:25 _joe_: repooling mw1189
  • 15:25 logmsgbot: marktraceur Synchronized php-1.25wmf2/extensions/Wikidata/: [SWAT] [wmf2] Update CSS for Wikidata badges (duration: 00m 11s)
  • 15:11 logmsgbot: marktraceur Synchronized wmf-config/InitialiseSettings.php: [SWAT] Enable wgCopyUploadsFromSpecialUpload on testwiki, Add commons to wgImportSources for sewikimedia (duration: 00m 05s)
  • 15:05 logmsgbot: marktraceur Synchronized wmf-config/InitialiseSettings.php: Re-enable prerendering of thumbnails for new files. (duration: 00m 05s)
  • 15:02 andrewbogott: removed 'publicKey' and 'accessKey' from ldap user records -- they were obsolete and making everyone nervous
  • 14:08 _joe_: depooling mw1189 from the api pool, reimaging with hhvm
  • 10:11 hashar: Updated our Jenkins Job Builder forked repository ( ee80dbc..7ad4386 ). No job configuration impact.
  • 09:52 paravoid: rebooting ms-be1003, sdn3/xfs troubles
  • 09:30 paravoid: powercycling tmh1002, unresponsive, stuck, no vga output
  • 08:44 godog: powercycle ms-be1007, no ssh and no console
  • 07:53 hashar: Jenkins: upgrading PHPUnit from 3.7.28 to 3.7.37 164683 wikitech-l announce
  • 04:44 springle: xtrabackup clone es1004 to es2001
  • 04:24 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Oct 16 04:24:21 UTC 2014 (duration 24m 20s)
  • 02:46 logmsgbot: LocalisationUpdate completed (1.25wmf3) at 2014-10-16 02:46:46+00:00
  • 02:25 logmsgbot: LocalisationUpdate completed (1.25wmf2) at 2014-10-16 02:24:55+00:00
  • 01:31 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1061 (duration: 00m 07s)
  • 00:49 Krinkle: Zuul queue made unstuck by fixing the clogged build (see bug 72113)
  • 00:14 Krinkle: Jenkins queue is stuck (99% free executors, but it's not running any of Zuul's pending jobs)

October 15

  • 23:10 logmsgbot: maxsem Synchronized php-1.25wmf2/extensions/MobileFrontend/: (no message) (duration: 00m 04s)
  • 23:04 logmsgbot: maxsem Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/166886/ (duration: 00m 05s)
  • 21:37 ori: restarting hhvm on mw1114, this time with luasandbox
  • 21:20 hoo: Gracefulled apache on mw1115
  • 21:06 ejegg: updated crm from 05e5388df34059c651223d53fb2986ac1c39a2d9 to d6a75b6df4482de61da372fa653902db7ca12766
  • 21:02 legoktm_: running mwscript extensions/CentralAuth/maintenance/migrateAccount.php on terbium for broken accounts (bug 61876)
  • 20:48 mutante: deleting/shredding ishmael cert/keys from neon
  • 20:41 hoo: Deleted 147 orphan wb_terms entries (bug 71914)
  • 20:40 ori: restarting apache on mw1115 to test luasandbox
  • 20:25 logmsgbot: yurik Synchronized wmf-config/mobile.php: Disable font for ZeroBanner (duration: 01m 05s)
  • 20:17 hoo: Deleted ten orphan wb_entity_per_page rows on wikidata
  • 20:17 logmsgbot: yurik Synchronized php-1.25wmf3/extensions/ZeroPortal: updatidng ZeroPortal to master (duration: 01m 11s)
  • 20:13 arlolra: restarted ocg service
  • 20:02 logmsgbot: yurik Synchronized php-1.25wmf2/extensions/ZeroPortal: updatidng ZeroPortal to master (duration: 01m 15s)
  • 19:47 logmsgbot: yurik Synchronized php-1.25wmf3/extensions/ZeroBanner/: Latest ZeroBanner (duration: 01m 11s)
  • 19:37 logmsgbot: yurik Synchronized php-1.25wmf2/extensions/ZeroBanner/: Latest ZeroBanner (duration: 01m 07s)
  • 19:05 ottomata: deployed webstatscollector 0.4 on oxygen (filter) and gadolinium (collector)
  • 18:46 ori: adjusting pybal weight for mw1114 back up to 20 to confirm that leak is in luasandbox
  • 17:57 ori_: installed lua5.1 on mw1114 so i can switch scribunto to luastandalone and thus potentially isolate the leak to luasandbox
  • 15:38 andrewbogott: running sync-common on virt1000
  • 15:36 logmsgbot: marktraceur Synchronized php-1.25wmf3/extensions/OpenStackManager/: [SWAT] [wmf3] Make list=novainstances available to anons (duration: 00m 06s)
  • 15:36 logmsgbot: marktraceur Synchronized php-1.25wmf2/extensions/OpenStackManager/: [SWAT] [wmf2] Make list=novainstances available to anons (duration: 00m 05s)
  • 15:22 paravoid: AMS-IX renumbering: remove old IP from interface, migration over; > 75% of total peers migrated, accounting for much more bandwidth/routes
  • 15:13 logmsgbot: marktraceur Synchronized php-1.25wmf3/extensions/VisualEditor/modules/ve-mw/ui: [SWAT] [wmf3] Update OOjs UI to v0.1.0-pre (d74a46ca6a) and VisualEditor-MediaWiki to Ie06056b (duration: 00m 05s)
  • 15:12 logmsgbot: marktraceur Synchronized php-1.25wmf3/resources/lib/oojs-ui: [SWAT] [wmf3] Update OOjs UI to v0.1.0-pre (d74a46ca6a) and VisualEditor-MediaWiki to Ie06056b (duration: 00m 06s)
  • 14:32 paravoid: AMS-IX renumbering: move all remaining ASNs to the new space
  • 14:20 Coren: Not reimaging mw1035 after all; hhvm is in our base, killing our ramz.
  • 14:16 paravoid: AMS-IX renumbering: peering with (renumbered) top-10 ASNs + ASNs with large number of prefixes
  • 14:12 Coren: reimaging mw1035 for great justice!!! (HHVM)
  • 14:11 _joe_: powercycling mw1205, down since this morning, console blank
  • 13:29 Jeff_Green: dist-upgrade and reboot indium
  • 12:40 hashar: disabled/reenabled gearman plugin at https://integration.wikimedia.org/ci/manage
  • 12:34 hashar: Zuul frozen \O/
  • 11:38 _joe_: depooling mw1114 again
  • 10:59 paravoid: AMS-IX renumbering: adding second IP, peering with RS1
  • 10:47 _joe_: repooled mw1114 with reduced load, using jemalloc with prof_leak enabled for sampling. will depool again soon
  • 10:27 _joe_: depooling mw1114, stopping puppet for debugging purposes
  • 09:24 godog: enable container sync for commons containers
  • 08:09 hashar: restarting Jenkins
  • 08:06 hashar: Jenkins: upgrading Gearman plugin to Patchset 9 of https://review.openstack.org/#/c/125755/
  • 07:52 springle: ongoing schema changes rev_content_(model|format) multiple shards, ok to kill osc_host.sh jobs on terbium in emergency
  • 06:58 _joe_: restarting hhvm on mw1114 to avoid memory exhaustion
  • 05:54 K4-713: updated payments to 944596744e0de23fee098
  • 03:34 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Oct 15 03:34:50 UTC 2014 (duration 34m 49s)
  • 02:28 logmsgbot: LocalisationUpdate completed (1.25wmf3) at 2014-10-15 02:28:00+00:00
  • 02:15 logmsgbot: LocalisationUpdate completed (1.25wmf2) at 2014-10-15 02:15:31+00:00

October 14

  • 23:42 logmsgbot: catrope Synchronized php-1.25wmf3/extensions/Collection: SWAT (duration: 00m 08s)
  • 23:42 logmsgbot: catrope Synchronized php-1.25wmf2/extensions/Collection: SWAT (duration: 00m 04s)
  • 22:24 hoo: Gracefulled apache on mw1075
  • 19:16 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 17s)
  • 19:07 logmsgbot: reedy Synchronized images/: (no message) (duration: 00m 14s)
  • 18:52 Reedy: Purged php-1.24wmf18 from mediawiki-appservers
  • 18:41 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non wikipedias to 1.25wmf3
  • 18:20 logmsgbot: reedy Purged l10n cache for 1.25wmf1
  • 18:15 godog: stop diamond on virt1000 and zirconium to test
  • 17:52 godog: conntrack full on virt1000 and zirconium, suspected diamond collector runaway
  • 16:33 _joe_: repooling mw1114
  • 15:27 _joe_: reimaging mw1114 with HHVM - first server in the API pool; depooling and reinstalling now.
  • 15:22 logmsgbot: marktraceur Synchronized php-1.25wmf3/resources/lib/oojs-ui/: [SWAT] [wmf3] OOjs UI: New pull-through to 837b2f733e to fix a missed dependency (duration: 00m 06s)
  • 15:12 logmsgbot: marktraceur Synchronized php-1.25wmf3/includes/api/ApiQueryBacklinks.php: [SWAT] [wmf3] API: Fix ApiQueryBacklinks redirlinks (duration: 00m 05s)
  • 15:11 logmsgbot: marktraceur Synchronized php-1.25wmf2/includes/api/ApiQueryBacklinks.php: [SWAT] [wmf2] API: Fix ApiQueryBacklinks redirlinks (duration: 00m 06s)
  • 13:56 godog: enable container sync on non-commons sharded containers
  • 12:27 hashar: Jenkins restarting
  • 12:25 hashar: Jenkins: upgrading Gearman plugin to fix jobs registrations ( cherry picked https://review.openstack.org/#/c/125755/ and compiled it via maven ).
  • 10:38 godog: enable container sync on non-sharded originals containers
  • 09:46 godog: upload python-elasticsearch to trusty-wikimedia
  • 09:44 _joe_: running sync-common on mw1163
  • 03:34 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Oct 14 03:34:53 UTC 2014 (duration 34m 52s)
  • 02:27 logmsgbot: LocalisationUpdate completed (1.25wmf3) at 2014-10-14 02:27:23+00:00
  • 02:15 logmsgbot: LocalisationUpdate completed (1.25wmf2) at 2014-10-14 02:15:19+00:00
  • 01:42 springle: restarted gitblit

October 13

  • 22:17 akosiaris: ran update-rc.d -f puppetmaster remove on palladium/strontium
  • 21:47 akosiaris: restarting apache on palladium, strontium
  • 20:20 _joe_: load test done. HHVM is awesome
  • 19:51 _joe_: load test on HHVM starting
  • 19:13 logmsgbot: oblivian Synchronized wmf-config/CommonSettings.php: Serving 5% of anons with HHVM (duration: 00m 12s)
  • 17:12 logmsgbot: oblivian Synchronized wmf-config/CommonSettings.php: Serving 2% of anons with HHVM (duration: 00m 06s)
  • 16:19 _joe_: restarting gitlbit, stuck in GC probably
  • 14:30 _joe_: rolling restart of the ocg cluster
  • 13:10 godog: enable container sync on wikibooks originals as test
  • 11:01 logmsgbot: hoo Synchronized wmf-config/InitialiseSettings.php: Add edh-www.adw.uni-heidelberg.de to the wgCopyUploadsDomains whitelist (duration: 00m 08s)
  • 10:30 godog: enable container sync as a test on wiktionary*-local-public
  • 09:49 godog: test enable container sync on wikibooks-it-local-thumb
  • 08:24 godog: stopping swift on ms-be1013, out of disk space on /
  • 03:33 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Oct 13 03:33:51 UTC 2014 (duration 33m 50s)
  • 02:37 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: s1 api traffic prefer db1065 and db1066 (duration: 00m 07s)
  • 02:28 logmsgbot: LocalisationUpdate completed (1.25wmf3) at 2014-10-13 02:28:12+00:00
  • 02:16 logmsgbot: LocalisationUpdate completed (1.25wmf2) at 2014-10-13 02:16:03+00:00
  • 02:10 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1062, warm up (duration: 00m 07s)
  • 01:58 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: s4 api traffic prefer db1059 (duration: 00m 08s)

October 12

  • 13:21 logmsgbot: reedy Synchronized php-1.25wmf3/extensions/CentralAuth/: (no message) (duration: 00m 18s)
  • 13:19 logmsgbot: reedy Synchronized php-1.25wmf3/includes/templates/: Unbreak user signup (duration: 00m 15s)
  • 03:30 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Oct 12 03:30:17 UTC 2014 (duration 30m 16s)
  • 02:28 logmsgbot: LocalisationUpdate completed (1.25wmf3) at 2014-10-12 02:28:03+00:00
  • 02:16 logmsgbot: LocalisationUpdate completed (1.25wmf2) at 2014-10-12 02:16:12+00:00

October 11

  • 23:39 Reedy: killed both logstash events on logstash100[23]. Started logstash again after
  • 23:33 Reedy: killed both logstash events on logstash1001. Started logstash again after
  • 23:23 Reedy: Started logstash on logstash1001
  • 23:21 bd808: logstash not showing any udp2log events after 2014-10-10T01:42:22.000Z
  • 16:09 logmsgbot: hoo Synchronized php-1.25wmf3/extensions/CentralAuth/: Deploying forgotten backport from Thursday: SpecialCentralAutoLogin: Fix getting files after file layout change (duration: 00m 08s)
  • 06:01 bblack: put ms-fe.svc.codfw.wmnet into downtime for the next two days, because I'm tired of getting paged about it :p
  • 03:35 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Oct 11 03:35:00 UTC 2014 (duration 34m 59s)
  • 02:29 logmsgbot: LocalisationUpdate completed (1.25wmf3) at 2014-10-11 02:29:32+00:00
  • 02:17 logmsgbot: LocalisationUpdate completed (1.25wmf2) at 2014-10-11 02:17:18+00:00

October 10

  • 19:41 Coren: mw1026 rebuild complete (now with HHVM goodness in every bite!)
  • 18:02 Coren: begin reimaging of mw1026
  • 17:32 ejegg: updated civicrm from c684b07805ad75f10796fd4dbb82ece4818a7aa3 to 05e5388df34059c651223d53fb2986ac1c39a2d9
  • 17:17 Coren: mw1027 rebuild complete (now with HHVM goodness in every bite!)
  • 16:18 paravoid: stopping swift on ms-be1013, debugging
  • 14:07 hashar: Disconnecting / reconnecting Jenkins/Zuul gearman as per https://bugzilla.wikimedia.org/show_bug.cgi?id=63760#c12
  • 13:29 logmsgbot: manybubbles Synchronized wmf-config/CirrusSearch-common.php: Add configuration so cirrus can build an index to speed up regex searches (duration: 00m 04s)
  • 11:22 godog: rolling restart of container-server on ms-be1*
  • 11:13 godog: rolling restart of container-server on ms-be2*
  • 07:12 legoktm: running migratePass0 across all wikis
  • 06:49 _joe_: load testing done
  • 06:22 _joe_: doing some load testing on HHVM (api)
  • 04:38 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Oct 10 04:38:38 UTC 2014 (duration 38m 37s)
  • 04:22 ^d: elasticsearch upgrade from 1.3.2 -> 1.3.4 complete for all 18 nodes. Sporadic icinga warnings about health should go away now
  • 04:19 springle: upgrade db1046 mariadb 10
  • 03:44 springle: enable purging of old eventlogging data from specific tables on m2-master, as per analytics@ discussion
  • 03:12 logmsgbot: LocalisationUpdate completed (1.25wmf3) at 2014-10-10 03:12:51+00:00
  • 02:39 logmsgbot: LocalisationUpdate completed (1.25wmf2) at 2014-10-10 02:39:23+00:00
  • 00:53 legoktm: running initSiteStats.php on wikidatawiki
  • 00:47 legoktm: ran updateArticleCount.php --wiki=ckbwiki (bug 71884)

October 9

  • 23:57 logmsgbot: maxsem Synchronized php-1.25wmf2/resources/: (no message) (duration: 00m 03s)
  • 23:57 logmsgbot: maxsem Synchronized php-1.25wmf3/resources/: (no message) (duration: 00m 04s)
  • 23:55 logmsgbot: maxsem Synchronized php-1.25wmf2/extensions/OpenStackManager/: (no message) (duration: 00m 04s)
  • 23:52 logmsgbot: maxsem Synchronized php-1.25wmf3/extensions/MobileApp: (no message) (duration: 00m 03s)
  • 23:50 logmsgbot: maxsem Synchronized php-1.25wmf2/extensions/MobileApp: (no message) (duration: 00m 04s)
  • 23:45 logmsgbot: maxsem Synchronized php-1.25wmf2/extensions/Flow/: (no message) (duration: 00m 09s)
  • 23:43 logmsgbot: maxsem Synchronized php-1.25wmf3/extensions/Wikidata/: (no message) (duration: 00m 10s)
  • 23:40 logmsgbot: maxsem Synchronized php-1.25wmf2/extensions/Wikidata/: (no message) (duration: 00m 10s)
  • 22:24 K4-713: updated payments wiki to 17f822a64742bd13e
  • 20:33 subbu: deployed parsoid version 644071d2
  • 20:03 Jeff_Green: rebooting samarium
  • 19:51 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 15s)
  • 19:42 manybubbles: upgrading elastic1014
  • 19:34 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.25wmf3
  • 19:31 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikipedias to 1.25wmf2
  • 19:21 logmsgbot: reedy Finished scap: testwiki to 1.25wmf3 and build l10n cache (take 2) (duration: 30m 00s)
  • 18:51 logmsgbot: reedy Started scap: testwiki to 1.25wmf3 and build l10n cache (take 2)
  • 18:51 bd808: cherry-picked I3ae9edab2505c37945fe66863721913a6d33223c to scap
  • 18:42 logmsgbot: reedy scap failed: TypeError bufsize must be an integer (duration: 08m 33s)
  • 18:34 logmsgbot: reedy Started scap: testwiki to 1.25wmf3 and build l10n cache
  • 17:56 Coren: begin reimaging of mw1027
  • 17:55 Coren: done reimaging of mw1028. Now hhvm_appserver
  • 16:58 _joe_: gracefully restarted again api apaches to recover 500s
  • 16:43 godog: re-enable puppet on ms-fe/ms-be in eqiad
  • 16:39 godog: re-enable puppet on ms-fe/ms-be in codfw
  • 16:23 logmsgbot: oblivian gracefulled all apaches
  • 15:34 hashar: restarted Zuul
  • 15:31 Coren: begin reimaging of mw1028
  • 15:31 Coren: done reimaging of mw1029. Now hhvm_appserver
  • 15:28 logmsgbot: manybubbles Synchronized php-1.25wmf2/extensions/VisualEditor/: SWAT deploy VE cherry-pick (duration: 00m 06s)
  • 15:26 andrewbogott: upgraded wikitech-static to 1.25wmf2
  • 14:02 akosiaris: updated pybal on palladium for citoid
  • 13:54 Coren: begin reimaging of mw1029
  • 12:01 logmsgbot: reedy Purged l10n cache for 1.24wmf22
  • 11:59 springle: converted some librenms tables to innodb on db1001 m1-master. should be a no-op
  • 11:57 springle: xtrabackup db1016 to db2010
  • 11:39 manybubbles: starting upgrade of elastic1009
  • 11:11 _joe_: reenabled puppet on mw*
  • 11:11 godog: disabled puppet in ms-fe/ms-be in eqiad/codfw to merge container-sync changes
  • 10:35 _joe_: disabling puppet on most mw* hosts while testing apache changes
  • 08:17 _joe_: repooling mw102[3-5],mw1053 in the hhvm pool
  • 07:15 _joe_: reimaging mw102[3-5] to hhvm
  • 07:02 _joe_: reinstalling mw1053
  • 03:33 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Oct 9 03:33:47 UTC 2014 (duration 33m 46s)
  • 02:30 logmsgbot: LocalisationUpdate completed (1.25wmf2) at 2014-10-09 02:30:03+00:00
  • 02:17 logmsgbot: LocalisationUpdate completed (1.25wmf1) at 2014-10-09 02:17:53+00:00

October 8

  • 23:31 mutante: importing xml dump to cawikimedia
  • 23:28 andrewbogott: running sync-common on virt1000
  • 23:27 logmsgbot: demon Synchronized php-1.25wmf2/extensions/OpenStackManager: (no message) (duration: 00m 06s)
  • 23:23 logmsgbot: demon Synchronized php-1.25wmf2/extensions/Flow: (no message) (duration: 00m 05s)
  • 23:14 logmsgbot: demon Synchronized php-1.25wmf2/extensions/CommonsMetadata: (no message) (duration: 00m 06s)
  • 22:40 mutante: virt0 - deleted salt key, revoked puppet cert, removed from site.pp
  • 22:39 Reedy: Removed openjdk-6-* from logstash100[1-3]
  • 22:07 subbu: updated OCG to version def24eca
  • 22:07 mutante: tin - deleted empty pmtpa dsh group files
  • 22:02 mutante: tin - there are dozens of dsh groups that have been removed from repo long time ago but never got purged, but it isn't easy to tell what might still be used, so deleting all and letting puppet recreate might be risky?
  • 20:48 legoktm: currently running /home/legoktm/fixBug71749.php on terbium
  • 19:49 Reedy: logstash upgraded to 1.4.2-1 on logstash100[1-3]
  • 19:46 Reedy: Created flow tables on officewiki
  • 17:04 _joe_: load testing done
  • 16:44 _joe_: doing some load testing on the hhvm servers
  • 16:09 Reedy: elasticsearch upgraded on logstash1001 to 1.3.4
  • 16:07 Reedy: elasticsearch upgraded on logstash100[23] to 1.3.4
  • 16:07 Reedy: elasticsearch upgraded on logstash[
  • 15:08 logmsgbot: anomie Synchronized php-1.25wmf2/extensions/VisualEditor/lib/ve/src/ce/ve.ce.Surface.js: SWAT: Revert "ve.ce.Surface: Magic workaround for broken Firefox cursoring" gerrit:164593 (duration: 00m 09s)
  • 15:04 manybubbles: upgrading elastic1002 now
  • 14:53 _joe_: repooling mw1163
  • 14:20 hasharBusy: disabled puppet on gallium to make sure a zuul config change stick in. 165481
  • 14:19 manybubbles: fixed missing elasticsearch extension jar file and brought elastic1001 back up. git fat betrayed us.
  • 14:14 hasharBusy: hard restarting zuul
  • 14:03 manybubbles: upgrading elastic1001 uncovered a bug in our highlighter that I have yet to diagnose. I removed that server from the rotation so we'll continue to use the old version.
  • 13:49 _joe_: depooling && reimaging mw1163
  • 12:44 manybubbles: upgraded elastic1001 to Elasticsearch 1.3.2 -> 1.3.4, experimental highlighter 0.0.11 -> 0.0.12, and installed trigram accelerated regex search 0.0.1
  • 12:32 manybubbles: deploying new elasticsearch plugins in preparation for minor Elasticsearch version upgrade today
  • 11:02 logmsgbot: reedy Synchronized docroot and w: good riddance to bad docroots (duration: 00m 16s)
  • 09:27 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: isolate api traffic on s2 to db1054 and db1060 (duration: 01m 20s)
  • 09:03 springle: killed masses of sleeping connections on s2 slaves
  • 08:11 paravoid: powercycling rhenium, unresponsive
  • 07:55 springle: restart db2011
  • 04:31 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Oct 8 04:31:03 UTC 2014 (duration 31m 2s)
  • 03:18 logmsgbot: LocalisationUpdate completed (1.25wmf2) at 2014-10-08 03:18:44+00:00
  • 02:40 logmsgbot: LocalisationUpdate completed (1.25wmf1) at 2014-10-08 02:40:48+00:00
  • 02:02 logmsgbot: tstarling Finished scap: (no message) (duration: 09m 01s)
  • 01:53 logmsgbot: tstarling Started scap: (no message)
  • 01:35 logmsgbot: tstarling scap failed: CalledProcessError Command '('/usr/bin/git', 'rev-list', '-1', '@{upstream}')' returned non-zero exit status 128 (duration: 00m 14s)
  • 01:35 logmsgbot: tstarling Started scap: (no message)
  • 01:32 logmsgbot: tstarling scap failed: CalledProcessError Command '('/usr/bin/git', 'rev-list', '-1', '@{upstream}')' returned non-zero exit status 128 (duration: 00m 14s)
  • 01:31 logmsgbot: tstarling Started scap: (no message)
  • 01:16 logmsgbot: tstarling scap failed: CalledProcessError Command '('/usr/bin/git', 'rev-list', '-1', '@{upstream}')' returned non-zero exit status 128 (duration: 00m 25s)
  • 01:16 logmsgbot: tstarling Started scap: update for Wikidata crash bug
  • 00:41 mutante: searchidx1001 - same, fixed duplicate salt-minion
  • 00:40 mutante: osmium - salt-minion was running twice, stopped both, killed one, restarted properly
  • 00:38 mutante: cp3016 - why you report failed puppet unlike everyone else but then it works
  • 00:33 springle: long schema changes running from terbium. ok to kill osc_host.sh in emergency
  • 00:01 logmsgbot: ori Synchronized php-1.25wmf2/extensions/WikimediaEvents: Update WikimediaEvents for If9cdde0f0 (duration: 00m 03s)
  • 00:01 logmsgbot: ori Synchronized php-1.25wmf1/extensions/WikimediaEvents: Update WikimediaEvents for If9cdde0f0 (duration: 00m 04s)

October 7

  • 23:29 andrewbogott: restarting every shutoff VM on virt1005
  • 23:20 logmsgbot: maxsem Synchronized wmf-config/InitialiseSettings.php: SWAT: https://gerrit.wikimedia.org/r/165393 (duration: 00m 04s)
  • 22:54 cscott: updated OCG to version c778ea8b898f8ad8c2b7ad9de78a75469e7ed061
  • 22:50 mutante: db68,tarin - revoke the last remaining pmtpa certs
  • 22:48 logmsgbot: ori Synchronized php-1.25wmf1/extensions/WikimediaEvents: Update WikimediaEvents for Ied71b5032: Groundwork for HHVM productivity analysis (duration: 00m 04s)
  • 22:47 mutante: db60,db69-74,es4,es7,es10 - remove from icinga monitoring, puppet certs, salt keys
  • 22:42 logmsgbot: ori Synchronized php-1.25wmf2/extensions/WikimediaEvents: Update WikimediaEvents for Ied71b5032: Groundwork for HHVM productivity analysis (duration: 00m 04s)
  • 22:40 mutante: fenari - revoked puppet cert, rm salt key, rm from icinga ...
  • 22:37 andrewbogott: cycling power on virt1005 -- unresponsive
  • 21:27 mutante: mchenry - revoke puppet cert, clean storedconfigs/rm from icinga
  • 21:04 mutante: dobson - revoke puppet cert, delete from storedconfigs/icinga, deleted from dsh
  • 20:56 K4-713: altered worldpay account settings for France on payments
  • 20:48 mutante: mexia - revoke salt,puppet,monitoring,storedconfigs
  • 20:27 mutante: pdf2/pdf3 - revoked puppet certs, removed from DNS & icinga
  • 19:42 mutante: temp. stopped icinga-wm
  • 19:41 mutante: restarting apache on palladium - mod_passenger fail
  • 19:29 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 23s)
  • 19:29 logmsgbot: reedy Synchronized database lists: (no message) (duration: 00m 20s)
  • 19:20 Reedy: Created EducationProgram tables on cawiki
  • 19:19 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 26s)
  • 19:09 ^d: cleared old files from runs on gallium tmpfs, testing should recover now.
  • 18:45 csteipp: deployed fix for bug 71749
  • 18:43 mutante: sanger - deleted salt key, revoked puppet cert, rm icinga stored config, already out of DNS - Killing sanger.wikimedia.org...done.
  • 18:42 logmsgbot: csteipp Synchronized php-1.25wmf2/extensions/CentralAuth/CentralAuthPlugin.php: (no message) (duration: 00m 06s)
  • 18:34 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non wikipedias to 1.25wmf2
  • 18:25 ^d: jenkins tmpfs run out of space again, tests failing
  • 16:24 logmsgbot: reedy Synchronized database lists: echo for fawikivoyage (duration: 00m 20s)
  • 16:22 Reedy: Created echo tables on fawikivoyage on extension1 cluster
  • 15:00 logmsgbot: reedy Synchronized docroot and w: (no message) (duration: 00m 14s)
  • 14:31 logmsgbot: reedy Synchronized docroot and w: Fixup noc (duration: 00m 16s)
  • 10:53 godog: restart commons swiftrepl from ms-fe1003 and non-commons from ms-fe1004 to avoid maxing out copper's nic
  • 10:09 godog: start swiftrepl of commons originals eqiad -> codfw
  • 09:56 godog: start swiftrepl of non-commons originals eqiad -> codfw
  • 06:02 logmsgbot: ori Synchronized php-1.25wmf1/includes/objectcache/HashBagOStuff.php: I0b0b5f01: HashBagOStuff: use the value itself as the CAS token (duration: 00m 06s)
  • 06:02 logmsgbot: ori Synchronized php-1.25wmf2/includes/objectcache/HashBagOStuff.php: I0b0b5f01: HashBagOStuff: use the value itself as the CAS token (duration: 00m 07s)
  • 03:28 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Oct 7 03:28:09 UTC 2014 (duration 28m 8s)
  • 02:26 logmsgbot: LocalisationUpdate completed (1.25wmf2) at 2014-10-07 02:26:04+00:00
  • 02:14 logmsgbot: LocalisationUpdate completed (1.25wmf1) at 2014-10-07 02:14:21+00:00
  • 01:16 logmsgbot: ori Synchronized php-1.25wmf1/extensions/Wikidata: Ie92da71 / I44f1dce: Update Wikidata, fixes for serialization issues (duration: 00m 09s)
  • 01:15 logmsgbot: ori Synchronized php-1.25wmf2/extensions/Wikidata: Ie92da71 / I44f1dce: Update Wikidata, fixes for serialization issues (duration: 00m 10s)
  • 00:34 Tim: core dumps were enabled on mw1088, unexpectedly started gathering natural segfault traffic

October 6

  • 23:49 mutante: tarin, nfs-1 - revoked salt key,puppet cert, stored configs
  • 23:06 logmsgbot: maxsem Synchronized php-1.25wmf2/extensions/MobileFrontend/: (no message) (duration: 00m 04s)
  • 23:06 logmsgbot: maxsem Synchronized php-1.25wmf1/extensions/MobileFrontend/: (no message) (duration: 00m 04s)
  • 23:05 logmsgbot: maxsem Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/165099/ https://gerrit.wikimedia.org/r/#/c/164902/ (duration: 00m 04s)
  • 22:28 Tim: on mw1088 debugging crash bug 71542
  • 21:59 hoo: Reverted wd:Q17939676 to 157541810 and edit=sysop
  • 21:47 cscott: updated OCG to version bbdf4c6400cfbbc6030114ad16e1a6f7025eab2c
  • 20:21 awight: backfilled recurring GC glitch from FR #2018, 3342 records affected.
  • 20:16 subbu: deployed parsoid sha 13a53ab3 (deploy repo sha 38d44ada7)
  • 19:03 akosiaris: issued cf disable and halt on nas1-a.pmtpa.wmnet nas1-b.pmtpa.wmnet. They are officially down :)
  • 17:22 _joe_: hhvm load test finished
  • 17:03 _joe_: depooling and repooling progressively hhvm appservers to do see performance under load
  • 16:57 Krinkle: git-deploy: Deploying integration/slave-scripts 0b85d48
  • 16:19 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: Set $wgPercentHHVM to 1 (duration: 00m 27s)
  • 15:54 cscott: updated OCG to version aee3712b352f51f96569de0bcccf3facf654e688
  • 15:45 GroggyPanda: deleted graphite data for deployment-rsync02 by hand on labmon1001, since instance has been dead. Need to move to shinken + dynamic host.cfg
  • 15:37 logmsgbot: manybubbles Synchronized php-1.25wmf1/extensions/Wikidata/: SWAT update wikidata (duration: 00m 10s)
  • 15:23 logmsgbot: manybubbles Synchronized php-1.25wmf2/extensions/Wikidata/: SWAT update wikidata (duration: 00m 10s)
  • 15:22 hashar: Zuul jobs proceeding again
  • 15:22 godog: swiftrepl replicating non-sharded originals containers eqiad -> codfw
  • 15:22 logmsgbot: manybubbles Synchronized wmf-config/: SWAT Add tracking categories for files with attribution problems (duration: 00m 06s)
  • 15:19 cscott: ran 'sudo -u ocg -g ocg nodejs-ocg scripts/run-garbage-collect.js -c /home/cscott/config.js' from /home/cscott/ocg/mw-ocg-service in order to clear caches (working around https://gerrit.wikimedia.org/r/164644 ) on ocg100x.eqiad.wmnet
  • 14:51 cmjohnson1: disconnecting Tampa servers
  • 13:46 godog: starting test swiftrepl run on wikibooks eqiad -> codfw
  • 11:49 _joe_: done restarting ocg servers
  • 11:34 _joe_: rolling restart and cleaning of ocg nodes, trying to unlock pdf generation
  • 11:11 mark: Shutdown tarin
  • 11:11 mark: Shutdown sanger
  • 09:27 _joe_: cleaned ocg another time
  • 09:07 mark: Stopped dovecot on sanger
  • 08:06 _joe_: cleaned ocg1001, again
  • 05:57 Nemo_bis: "book creator seems stuck": PDF servers at 97 % CPU, little traffic, enough disk free for about 1 day more
  • 03:26 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Oct 6 03:26:31 UTC 2014 (duration 26m 30s)
  • 02:59 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1060 (duration: 00m 06s)
  • 02:28 logmsgbot: LocalisationUpdate completed (1.25wmf2) at 2014-10-06 02:28:42+00:00
  • 02:17 logmsgbot: LocalisationUpdate completed (1.25wmf1) at 2014-10-06 02:17:40+00:00

October 5

  • 22:28 Coren: Q183 superprotected as a safeguard
  • 22:27 hoo: Q183 is on revision 116786096 again, please don't alter this further!
  • 22:21 qchris: Updated gerrit's hooks-bugzilla to 6e1e659 (with hooks-its at a421db4)
  • 22:11 hoo: WD:Q183 was frozen on version 120566337, see bug 71519 (and others)
  • 21:23 hoo: Bypassed Wikibase restrictions and set https://www.wikidata.org/wiki/Q183 back to old serialization format
  • 20:08 Nemo_bis: 22.03 < Ainali> It was just noticed on svwp village pump that http://stats.wikimedia.org is down
  • 16:39 paravoid: restore ns1 routing to codfw
  • 11:23 paravoid: adding static route for ns1 to rubidium (ns0) on cr1-eqiad to temporarily redirect its traffic while the codfw is offline
  • 03:19 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Oct 5 03:19:55 UTC 2014 (duration 19m 53s)
  • 03:02 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: I707b5754: Enable LuaSandbox profiling when is true (duration: 00m 07s)
  • 02:22 logmsgbot: LocalisationUpdate completed (1.25wmf2) at 2014-10-05 02:22:47+00:00
  • 02:13 logmsgbot: LocalisationUpdate completed (1.25wmf1) at 2014-10-05 02:13:26+00:00

October 4

  • 21:08 _joe_: cleaning ocg1001 tmpfs from a 32 gb pdf file
  • 19:59 jgage: restarted pdns on virt1000 for ldap config update
  • 07:08 springle: powercycle es1004
  • 03:27 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Oct 4 03:27:20 UTC 2014 (duration 27m 19s)
  • 02:25 logmsgbot: LocalisationUpdate completed (1.25wmf2) at 2014-10-04 02:25:04+00:00
  • 02:15 logmsgbot: LocalisationUpdate completed (1.25wmf1) at 2014-10-04 02:15:02+00:00
  • 01:01 bblack: depooling cp1045 for persistent cache wipe
  • 00:01 andrewbogott: updated the defaut labs precise image: updated ldap setup, new /var/log partition

October 3

  • 22:46 bd808: Restarting zuul on gallium
  • 22:40 bd808: Trying a soft restart of zuul on gallium
  • 22:37 bd808: NoConnectedServersError("No connected Gearman servers") in zuul.log on gallium
  • 22:33 bd808|deploy: Updated integration/phpunit to 6c1d11d (Regenerate autoloader)
  • 22:31 subbu: restarted Parsoid servers after another gradual cpu load creep
  • 22:19 logmsgbot: aaron Synchronized wmf-config/InitialiseSettings.php: Fixed the parser cache type for labswiki (duration: 00m 03s)
  • 21:55 andrewbogott: updated the defaut labs trusty image: updated packages, updated ldap setup, new /var/log partition
  • 20:37 ori: disabling puppet on rbf1002 to test bloom filter config
  • 20:31 logmsgbot: manybubbles Synchronized wmf-config/CirrusSearch-labs.php: noop update to sync beta configs (duration: 00m 04s)
  • 20:30 cscott: (the above was on ocg100x.eqiad.wmnet)
  • 20:30 cscott: ran 'sudo -u ocg -g ocg nodejs-ocg scripts/run-garbage-collect.js -c /home/cscott/config.js' from /home/cscott/ocg/mw-ocg-service in order to clear caches (working around https://gerrit.wikimedia.org/r/164644 )
  • 20:23 andrewbogott: restarted zuul
  • 20:22 logmsgbot: ori Synchronized wmf-config/mc.php: Ie1ed821a7: Set bloom cache config (duration: 00m 03s)
  • 19:34 ori: when running puppet merge: fatal: Unable to create '/var/lib/git/operations/puppet/.git/refs/remotes/origin/production.lock': File exists.
  • 19:28 hashar: Restarting Zuul sorry :-/
  • 19:26 hashar: Zuul in some kind of death loop
  • 19:05 mutante: purging old 'searchqa' scripts and logs from iron (gerrit 164429 removes from puppet)
  • 18:46 mutante: restored dbtree from manual backup (should have been synced by scripts)
  • 18:25 logmsgbot: aaron Synchronized php-1.25wmf2/extensions/CentralAuth: (no message) (duration: 00m 05s)
  • 18:14 logmsgbot: maxsem Synchronized php-1.25wmf2/extensions/MobileFrontend/: https://gerrit.wikimedia.org/r/#/c/164543/ (duration: 00m 04s)
  • 18:14 logmsgbot: maxsem Synchronized php-1.25wmf1/extensions/MobileFrontend/: https://gerrit.wikimedia.org/r/#/c/164543/ (duration: 00m 04s)
  • 17:53 bblack: restarted parsoid varnishes
  • 17:30 andrewbogott: enabling puppet on tungsten which is disabled for mysterious reasons
  • 17:04 gwicke: restarting parsoids after CPU spike
  • 16:50 ejegg: updated crm from 6f66294607d132230ef82fb4867c37a8700bfd4e to ef68d9fe98a64e819ebbdddbe5e13f83037607ce
  • 15:34 _joe_: purging varnish cache for parsoid (RT 8528)
  • 15:19 mark: shutdown pdf2 & pdf3
  • 15:15 bblack: adding 10.2.1.0/24 aggregate in cr-[12].codfw
  • 15:14 bblack: dropping 10.2.1.0/24 aggregate + static routes in cr2-pmtpa
  • 14:37 bblack: testing dns server upgrade on baham
  • 14:35 logmsgbot: manybubbles Synchronized wmf-config/CirrusSearch-labs.php: Noop - just keeps beta config in sync (duration: 00m 04s)
  • 14:18 Jeff_Green: launched iodine:/opt/otrs/bin/otrs.RebuildFulltextIndex.pl per bugzilla #64473
  • 13:07 Reedy: Updated minor_mime to varbinary(100) on image|filearchive|oldimage on foundationwiki
  • 12:30 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool es1004 (duration: 00m 06s)
  • 11:51 logmsgbot: reedy Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 15s)
  • 11:49 logmsgbot: aude Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 10s)
  • 11:37 springle: shutdown db60 db68 db69 db71 db72 db73 db74 es4 es7 es10
  • 10:43 mark: Shutdown amaranth.toolserver.org's switchport on asw-d-pmtpa
  • 09:35 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1063 (duration: 00m 07s)
  • 08:31 akosiaris: deleting snmp community from nas1-a, nas1-b. I guess librenms is going to start complaining
  • 08:20 akosiaris: unexporting, offline, destroying /vol/home_pmtpa on nas1-a
  • 06:59 _joe_: depooling mw1022, then reimaging it
  • 04:22 springle: upgrade db1063 mariadb 10
  • 04:16 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Oct 3 04:16:35 UTC 2014 (duration 16m 34s)
  • 04:04 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1063 (duration: 00m 08s)
  • 03:17 logmsgbot: LocalisationUpdate completed (1.25wmf2) at 2014-10-03 03:17:28+00:00
  • 02:44 logmsgbot: LocalisationUpdate completed (1.25wmf1) at 2014-10-03 02:44:39+00:00

October 2

  • 23:15 logmsgbot: maxsem Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/164483 (duration: 00m 03s)
  • 23:06 logmsgbot: maxsem Synchronized php-1.25wmf2/extensions/MobileFrontend/: (no message) (duration: 00m 04s)
  • 22:53 logmsgbot: reedy Synchronized wmf-config/: Experimentally enable vips for larger (>50MP) tiff files (duration: 00m 15s)
  • 22:38 mutante: icinga_broken_due_to_missing_hostgroup_counter incremented
  • 21:43 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: Disabling prerendering of images from this mornings swat (duration: 00m 04s)
  • 21:33 bblack: added LVS BGP config setup to cr[12]-codfw
  • 20:46 logmsgbot: ori Synchronized php-1.24wmf22/extensions/NavigationTiming: Update NavigationTiming for cherry-picks (duration: 00m 03s)
  • 20:44 logmsgbot: ori Synchronized php-1.25wmf1/extensions/NavigationTiming: Update NavigationTiming for cherry-picks (duration: 00m 04s)
  • 20:11 bblack: stopping -> starting uwsgi/apache -type stuff on tungsten
  • 19:55 aude: populated sites table for fawikivoyage
  • 19:27 bd808: hosts that failed Trebuchet update of scap: virt0.wikimedia.org, fenari.wikimedia.org, mw1110.eqiad.wmnet, mw1053.eqiad.wmnet. mw1053.eqiad.wmnet only failed checkout
  • 19:23 bd808: Trebuchet reports for scap sync "231/234 minions completed fetch; 230/234 minions completed checkout" Some stale entries need to be removed from Trebuchet redis cache
  • 19:21 bd808: Updated scap to eff0d01 (Fix format specifier for error message)
  • 19:12 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: fawikivoyage (duration: 00m 16s)
  • 19:12 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: fawikivoyage
  • 19:11 logmsgbot: reedy Synchronized database lists: fawikivoyage (duration: 00m 16s)
  • 18:47 mutante: graceful'ed apache on mw1030,mw1164
  • 18:44 logmsgbot: reedy Purged l10n cache for 1.24wmf21
  • 18:43 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.25wmf2
  • 18:41 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias to 1.25wmf2
  • 18:39 logmsgbot: reedy Finished scap: testwiki to 1.25wmf2 (duration: 24m 19s)
  • 18:32 cmjohnson1: replacing failed disk es1005
  • 18:15 logmsgbot: reedy Started scap: testwiki to 1.25wmf2
  • 17:57 manybubbles: going to try to restart lsearchd on the misc pool machines to see if that makes it responsive
  • 17:42 ejegg: updated payments-wiki from 83464deed3b66da655ca5d1086852237c4793b71 to 9417bbd95057a87824be157dbbb5965a1f09d202
  • 17:26 logmsgbot: aaron Synchronized php-1.25wmf1/maintenance/findMissingFiles.php: aa2eb3c0de08256822a2b0c985ebb3a6145d28cd (duration: 00m 05s)
  • 16:30 ori: graceful'd all apaches for I98bcdbfc7: mediawiki: add vhost_combined log format to apache2.conf
  • 15:32 logmsgbot: demon Synchronized php-1.25wmf1/extensions/Wikidata: (no message) (duration: 00m 20s)
  • 15:32 logmsgbot: demon Synchronized php-1.25wmf1/includes/jobqueue/jobs/ThumbnailRenderJob.php: (no message) (duration: 00m 05s)
  • 15:21 godog: shutting down nfs1
  • 15:15 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 06s)
  • 15:11 logmsgbot: demon Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 04s)
  • 15:11 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 04s)
  • 15:09 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 04s)
  • 15:08 mark: Replaced exim4-deamon-light by exim4-daemon-heavy on tools-mail
  • 15:06 logmsgbot: demon Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 04s)
  • 15:05 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 05s)
  • 13:30 hashar: Zuul back around :]
  • 13:23 hashar: Zuul deadlocked somehow again :(
  • 12:32 godog: rolling-restart swift on ms-be1*, saw increased load possibly as a cause of 5xx spike
  • 12:07 godog: restarting rsyslog in eqiad
  • 12:03 akosiaris: restarted apparmor throughout the fleet
  • 11:58 hashar: Migrated all mediawiki-core-regression* jobs to Zuul cloner bug 71549
  • 10:57 godog: restarted rsyslog in codfw
  • 10:57 godog: restarted rsyslog in ulsfo
  • 10:54 godog: restarted rsyslog in esams
  • 09:28 godog: start rolling depooling/restart/pooling of swift frontends in eqiad to pick up syslog change
  • 09:11 _joe_: removing /mnt/tmpfs/fd29e937fea41d186175bcb880ef96980825dd1c.rdf2latex from ocg1001, it contains a 32 gb pdf
  • 09:08 _joe_: restarting node-ocg on ocg1001; a _lot_ of deleted files with the FD still opened
  • 08:10 hashar: Jenkins has been upgraded to latest LTS version 1.565.3
  • 07:33 hashar: Jenkins restarting
  • 07:27 hashar: Stopping Jenkins
  • 06:58 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool es1010 (duration: 00m 07s)
  • 06:53 springle: upgrade es1010 mariadb 10, restart
  • 06:36 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool es1010 (duration: 00m 07s)
  • 06:07 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool es1008, renable writes to external storage cluster 25 (duration: 00m 06s)
  • 06:00 ori: mw1053 still flooding error logs with "Unrecognized job type 'EchoNotificationDeleteJob'." Disabling Puppet and jobrunner for now, planning to investigate during SF daytime hours.
  • 05:59 springle: upgrade es1008 mariadb 10, restart
  • 05:21 MaxSem: Manual sync-common on mw1053
  • 05:19 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool es1008 (duration: 00m 08s)
  • 05:04 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: divert writes away from external storage cluster 25 (duration: 00m 10s)
  • 04:40 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: upgraded db1073, repool, warm up (duration: 00m 07s)
  • 04:22 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1073 (duration: 00m 13s)
  • 04:08 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Oct 2 04:08:17 UTC 2014 (duration 8m 16s)
  • 02:47 logmsgbot: LocalisationUpdate completed (1.25wmf1) at 2014-10-02 02:47:20+00:00
  • 02:24 logmsgbot: LocalisationUpdate completed (1.24wmf22) at 2014-10-02 02:24:08+00:00

October 1

  • 23:26 logmsgbot: demon Synchronized php-1.25wmf1/extensions/GuidedTour: (no message) (duration: 00m 04s)
  • 23:25 logmsgbot: demon Synchronized php-1.24wmf22/extensions/GuidedTour: (no message) (duration: 00m 04s)
  • 23:19 logmsgbot: demon Synchronized php-1.25wmf1/extensions/VisualEditor: (no message) (duration: 00m 04s)
  • 23:09 Krinkle: Jenkins restart finished. Patched git-client-plugin seems to work as expected (bug 71533).
  • 22:49 Krinkle: Deploy Jenkins git-client-plugin v1.4.6+wmf1 from https://github.com/wikimedia/git-client-plugin/commits/git-client-1.4.6+wmf1 (c80b05bb10985ab94c4c4217d07a0868087b5994) – https://wikitech.wikimedia.org/wiki/Nova_Resource:Integration/Jenkins_Plugin
  • 21:28 logmsgbot: aaron Synchronized php-1.25wmf1/maintenance/findMissingFiles.php: 832ed2ce9938dc51fdb4190423ce03e93e65c639 (duration: 00m 05s)
  • 21:04 mutante: fenari - shutdown -h now (omg) :)
  • 20:32 hoo: Ran sync-common on mw1053 to stop "Unrecognized job type 'EchoNotificationDeleteJob'." exceptions
  • 20:27 cscott: updated OCG to version 48c495e3656f528abe636ce0cd7562270505534f
  • 19:40 logmsgbot: yurik Synchronized php-1.25wmf1/extensions/ZeroBanner/: (no message) (duration: 01m 05s)
  • 19:38 logmsgbot: yurik Synchronized php-1.24wmf22/extensions/ZeroBanner/: (no message) (duration: 01m 03s)
  • 19:17 mutante: fenari - removed from dsh - rejoice deployers, should be faster now
  • 19:03 logmsgbot: yurik Synchronized php-1.24wmf22/extensions/ZeroBanner/: (no message) (duration: 01m 08s)
  • 18:59 logmsgbot: yurik Synchronized php-1.24wmf22/extensions/ZeroBanner/: (no message) (duration: 01m 09s)
  • 18:56 logmsgbot: yurik Synchronized php-1.24wmf22/extensions/ZeroBanner/: (no message) (duration: 01m 04s)
  • 18:34 Krinkle: (..jenkins) The command runs fine when done in that workspace from shell. Looks like a bug with Jenkins Java abstraction layer.
  • 18:28 ori: disabling puppet on mw1019 to test impact of ProxyBadHeader apache directive
  • 18:25 Krinkle: Jenkins jobs for repos with git submodules broken ("git-submodule: git reset: not found")
  • 17:41 andrewbogott: graceful'd apache on logstash1001 logstash1002 logstash1003
  • 16:56 cmjohnson1: swapping disk db1020
  • 16:32 godog: reverted change to syslog.eqiad.wmnet, back to nfs-home.pmtpa.wmnet
  • 15:40 JetLaggedPanda: purged graphite logs for deployment-mediawiki04 by hand on labmon1001 to prevent it from causing issues on icinga, since the instance has been deleted previously
  • 15:32 ottomata: starting upgrade of stat1002 from precise to trusty
  • 15:30 hashar: Jenkins added jgit as a git provider under https://integration.wikimedia.org/ci/configure
  • 15:30 logmsgbot: anomie Synchronized wmf-config/InitialiseSettings.php: SWAT: Rename wikibase debug log gerrit:164061 (duration: 00m 12s)
  • 15:27 logmsgbot: anomie Synchronized php-1.25wmf1/extensions/Wikidata/: SWAT: Fix js error that breaks editing properties on Wikidata gerrit:164079 (duration: 00m 16s)
  • 15:12 logmsgbot: anomie Synchronized wmf-config: SWAT: Disable the old mwlib PDF render service gerrit:163609 (duration: 00m 09s)
  • 15:05 andrewbogott: switched icinga over to the new ldap servers. Seems to still work so far...
  • 15:02 godog: switched syslog to lithium
  • 15:01 logmsgbot: anomie Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: Enable thumbnail prerendering at upload time on Beta gerrit:163836 (duration: 00m 09s)
  • 14:59 hashar: Jenkins changed git executable path from 'git' to '/usr/bin/git'
  • 14:41 godog: testing syslog change on mw1060
  • 14:40 mark: Shutdown mchenry
  • 14:32 andrewbogott: turning virt0 off again. Soon we won't have a choice about this, trying to flush out issues in the meantime.
  • 14:27 bblack: mexia powered off
  • 14:08 mark: Stopped pdns_recursor on mchenry
  • 14:02 mark: Shutting down dobson
  • 13:49 godog: temporarily override syslog.eqiad.wmnet on mw1053 for testing
  • 13:28 mark: Stopped DNS recursor on dobson
  • 13:21 mark: Stopped OpenDJ on sanger
  • 11:58 godog: reboot lithium for installation
  • 10:38 _joe_: re-enabled puppet on mw1018, repooling in a few
  • 10:16 paravoid: killed cp4006's stale puppet agent_disabled.lock, ran puppet
  • 10:16 akosiaris: started spare disk zeroing process on nas1-a
  • 10:16 akosiaris: destroyed backups aggregate on nas1-a
  • 09:58 akosiaris: destroyed baculasd1, baculasd2 and fr_archive volumes on nas1
  • 09:47 akosiaris: umount /home on fenari. fenari user homes no longer available
  • 09:42 akosiaris: touch /etc/nologin on fenari. Non root logins disallowed
  • 09:04 akosiaris: breaking the snapmirror relationships between nas1-a, nas1001-a. Effect: no more fr_archive syncing, fenari /home no longer is synced
  • 08:35 _joe_: disabling puppet on mw1018, enabling debug logging to get more details about fcgi reported errors
  • 03:51 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Oct 1 03:51:23 UTC 2014 (duration 51m 22s)
  • 02:42 mutante: jenkins config used virt0, login was needed though to change the config. blocked Krinkle
  • 02:38 mutante: bringing virt0 back up did indeed fix login on jenkins , also analytics-kafka appears to be still using it
  • 02:36 logmsgbot: LocalisationUpdate completed (1.25wmf1) at 2014-10-01 02:36:43+00:00
  • 02:31 mutante: virt0 - powering back up, suspecting it broke jenkins login
  • 02:31 ori: mw1053 flooding exception logs with: "Unrecognized job type 'EchoNotificationDeleteJob'." Disabling jobrunner & Puppet
  • 02:25 logmsgbot: LocalisationUpdate completed (1.24wmf22) at 2014-10-01 02:25:11+00:00