Server admin log/Archive 26

December 31

23:44 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1061 (duration: 00m 05s)
14:02 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1065, warm up (duration: 00m 06s)
09:47 godog: updating precise-wikimedia from third-party repo (hwraid)
09:45 godog: previous reprepro update also accidentally updated elasticsearch in trusty-wikimedia to 1.3.7
09:43 godog: updating trusty-wikimedia from third-party repo (hwraid)
02:22 springle: upgrade db1065 trusty
02:16 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1065 (duration: 00m 05s)
02:03 awight: updated payments from 78b72063e4e0cc76b7e168be1e626d5e10e34d4a to 62c81d4574e5e994ff8f3cac7115eff335bd5265
00:52 bd808: restarted elasticsearch on logstash1001
00:49 awight: updated payments from e81f473acc5b31b49dd27714c40f9b71c3462e26 to 78b72063e4e0cc76b7e168be1e626d5e10e34d4a
00:42 bd808: log2udp events still not making it into logstash; possibly related to earlier elasticsearch cluster issues; I don't want to restart elasticsearch on logstash1001 while the cluster is still recovering form that.
00:33 bd808: restarted logstash on logstash1001; log2udp events not being recorded in elasticsearch

December 30

21:52 bd808: restarted elasticsearch on logstash1002; it had dropped from the cluster
20:46 logmsgbot: yurik Synchronized wmf-config/CommonSettings.php: ZeroPortal 182227 (duration: 00m 06s)
19:06 paravoid: manually stopping acct on neon and setting /etc/default/acct ACCT_ENABLE to 0
16:38 godog: killing uwsgi on tunsten, blew memory
14:46 Nemo_bis: morebots is being rude today
14:36 logmsgbot: hoo Synchronized wmf-config/CommonSettings.php: Enable unregistered users editing on it.m.wikipedia.org after Dec 31 (duration: 00m 06s)

December 29

20:19 awight: payments updated from ce7fb9af37c4bba2a84668387b61729df4f9723c to e81f473acc5b31b49dd27714c40f9b71c3462e26
10:35 godog: reboot ms-be2011, stuck while removing a LD, no console

December 27

23:33 paravoid: restarting puppetmasters
20:29 gwicke: dropped old keyspaces titan{,2,3} on xenon to free space for titan4
19:53 ori: gallium: restarted jenkins
16:19 Reedy: jenkins started again...
16:17 Reedy: jenkins killed
16:12 Reedy: attempting to kill jenkins
16:11 Reedy: jenkins is hung with high cpu/memory usage
12:55 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: bump up s1 api load sent to db1066 (duration: 00m 06s)
12:11 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1066 and db1028, warm up (duration: 00m 06s)
05:49 ori: cerium disk space critical, so moved /mnt/data/cassandra/java_{1418354329,1418533386,1418537719}.hprof to /tmp/hprof_files, freeing up ~17G of space.

December 25

20:44 _joe_: restarting hhvm on mw1239, stuck in HPHP::is_valid_var_name probably after trying to call ini_set
00:32 logmsgbot: hoo Synchronized wmf-config/Bug54847.php: Fix for invalid hashes (this prevented some people from logging in) (duration: 00m 05s)
00:26 logmsgbot: spage Synchronized php-1.25wmf12/extensions/PageTriage/modules/ext.pageTriage.views.toolbar/ext.pageTriage.delete.js: Unbreak page curation on enwiki for Xmas (duration: 00m 05s)
00:20 logmsgbot: spage Synchronized php-1.25wmf13/extensions/PageTriage/modules/ext.pageTriage.views.toolbar/ext.pageTriage.delete.js: Unbreak page curation (duration: 00m 06s)

December 24

18:42 paravoid: manually running debmirror on carbon to sync over the holidays; "pkill -f debmirror" should suffice if there is a problem
14:57 akosiaris: disabled puppet on helium while testing copy jobs
13:59 Jeff_Green: package updates and reboots for several fundraising servers...
01:17 hoo: Ran mysql:wikiadmin@db1033 [centralauth]> DELETE FROM bug_54847_password_resets WHERE r_username = 'Stilfehler';

December 23

22:21 logmsgbot: hoo Synchronized wmf-config/: Syncing Kaldari's beta-only change (duration: 00m 07s)
20:34 K4-713: Re-enabled Thank You send job
19:53 logmsgbot: anomie Synchronized wmf-config: Labs-only change (duration: 00m 06s)
19:43 K4-713: disabled Thank You email send job
19:40 chasemp: updated phab sprint app to 0.6.1.4
18:32 paravoid: restarting icing
15:06 _joe_: gracefully reloading apache on palladium to clean up old puppet master instances
14:50 _joe_: restarted apache on strontium to verify hiera is working
14:17 godog: restart icinga on neon
08:30 _joe_: restarting gitblit, stuck at 100% cpu on a thread
03:25 andrewbogott: graceful'd apache2 on virt1000 (same intermittent passenger crash as always)

December 22

23:43 springle: xtrabackup clone db2010 to db2030
21:26 legoktm: ran delete from localnames where ln_name="Nonoh" and ln_wiki="ruwiki" limit 1; on centralauth for https://phabricator.wikimedia.org/T85041
20:02 awight: update paments from 3dde7be76284aa37b74038dfb4473671999dfcff to ce7fb9af37c4bba2a84668387b61729df4f9723c
19:53 awight: deployed CentralNotice RecordImpression logging for hide cookie bug
19:53 logmsgbot: awight Synchronized php-1.25wmf13/extensions/CentralNotice: RecordImpression logging for CentralNotice hide cookie bug (duration: 00m 06s)
19:53 logmsgbot: awight Synchronized php-1.25wmf12/extensions/CentralNotice: RecordImpression logging for CentralNotice hide cookie bug (duration: 00m 06s)
19:04 anomie: deployed T85113
18:59 YuviPanda: running sync-common on virt1000
17:42 cmjohnson1: taking neon down again to reseat idrac nic card
17:04 cmjohnson1: powering down neon (icinga) to drain flea power and reset idrac
16:37 _joe_: uploading java8 packages for trusty

December 21

14:18 hashar: Upgrading Zuul merger and server on gallium wmf-deploy-20141030-1 to wmf-deploy-20141221-1
14:16 hashar: Restarted moretbots on tools labs following some doc at https://wikitech.wikimedia.org/wiki/Morebots
14:16 hashar: Cleaning the mess Zuul code base is https://phabricator.wikimedia.org/T84917 . Updated master/labs branches and tagged it wmf-deploy-20141221-1

December 20

20:45 qchris: restarted webperf service statsd-mw-js-deprecate on hafnium. It seems it did not send metrics to statsd after an EventLogging restart.
02:59 logmsgbot: mattflaschen Synchronized php-1.25wmf13/resources/src/jquery.tipsy/jquery.tipsy.js: Fix "live" deprecated live mode of jQuery tipsy (duration: 00m 05s)
02:59 logmsgbot: mattflaschen Synchronized php-1.25wmf13/resources/src/jquery.tipsy/jquery.tipsy.js: Fix "live" deprecated live mode of jQuery tipsy (duration: 00m 05s)
02:59 logmsgbot: mattflaschen Synchronized php-1.25wmf12/resources/src/jquery.tipsy/jquery.tipsy.js: Fix "live" deprecated live mode of jQuery tipsy (duration: 00m 05s)
02:57 logmsgbot: mattflaschen Synchronized php-1.25wmf13/extensions/PageTriage/modules/ext.pageTriage.views.toolbar/: Fix to PageTriage not to use jQuery live (duration: 00m 07s)
02:57 logmsgbot: mattflaschen Synchronized php-1.25wmf12/extensions/PageTriage/modules/ext.pageTriage.views.toolbar/: Fix to PageTriage not to use jQuery live (duration: 00m 05s)
01:18 awight: payments rolled back to 3dde7be76284aa37b74038dfb4473671999dfcff
00:57 awight: payments updated from 3dde7be76284aa37b74038dfb4473671999dfcff to ce7fb9af37c4bba2a84668387b61729df4f9723c
00:35 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1072 warm up. depool db1066 replication error (duration: 00m 05s)

December 19

23:39 awight: payments rolled back ab93b636fae7bcb38a155c019ad102f3b071918c --> 3dde7be76284aa37b74038dfb4473671999dfcff
23:28 awight: payments updated from 3dde7be76284aa37b74038dfb4473671999dfcff to ab93b636fae7bcb38a155c019ad102f3b071918c
23:23 awight: rollback payments to 3dde7be76284aa37b74038dfb4473671999dfcff
23:18 awight: updated payments from 3dde7be76284aa37b74038dfb4473671999dfcff to ab93b636fae7bcb38a155c019ad102f3b071918c
21:27 awight: update crm from ae7b2381667dd65d68812c58f61e3ea66fa9fa6f to 80241fd2a43f03796b416d728661470f875a590a
17:54 hoo: Manually transferred the email from enwiki account "Hob Gadling" to the centralauth account of the same name (after a partially failed account creation).
12:54 logmsgbot: aude Synchronized php-1.25wmf12/extensions/Wikidata/extensions/Wikibase/lib/resources/jquery.wikibase: js caching issues (duration: 00m 05s)
07:15 andrewbogott: disabled puppet and nova-compute on virt1010 and virt1011 until I can sort out a libvirt issue.
06:55 _joe_: restarted HHVM on mw1184, stuck in HPHP::StatCache::refresh
03:36 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1028 (duration: 00m 06s)
02:37 logmsgbot: yurik Synchronized php-1.25wmf13/extensions/ZeroPortal: (no message) (duration: 00m 05s)
02:36 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1055, warm up (duration: 00m 06s)
02:35 Jeff_Green: pay-lvs1001inadvertently power cycled
02:19 logmsgbot: ori Synchronized wmf-config/StartProfiler.php: re-enable xenon (duration: 00m 06s)
01:56 logmsgbot: yurik Synchronized php-1.25wmf13/extensions/ZeroPortal: (no message) (duration: 00m 06s)
01:21 logmsgbot: maxsem Synchronized php-1.25wmf13/extensions/VisualEditor/: (no message) (duration: 00m 07s)
01:21 logmsgbot: maxsem Synchronized php-1.25wmf13/extensions/MobileFrontend/: (no message) (duration: 00m 05s)
01:13 logmsgbot: maxsem Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/180818 (duration: 00m 05s)
00:57 logmsgbot: maxsem Synchronized php-1.25wmf12/extensions/VisualEditor/: https://gerrit.wikimedia.org/r/#/c/180860/ (duration: 00m 07s)
00:56 logmsgbot: maxsem Synchronized php-1.25wmf12/resources/lib/oojs-ui/oojs-ui.js: https://gerrit.wikimedia.org/r/#/c/180860/ (duration: 00m 08s)
00:53 logmsgbot: demon Synchronized php-1.25wmf13/includes/Html.php: (no message) (duration: 00m 05s)
00:30 logmsgbot: maxsem Synchronized php-1.25wmf12/extensions/Wikidata/: https://gerrit.wikimedia.org/r/#q,181000,n,z (duration: 00m 13s)
00:28 logmsgbot: maxsem Synchronized php-1.25wmf13/extensions/Wikidata/: https://gerrit.wikimedia.org/r/#q,181003,n,z (duration: 00m 12s)
00:18 logmsgbot: maxsem Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/180867 (duration: 00m 06s)
00:05 Krinkle: Reloading Zuul to deploy I3333f5e45

December 18

23:35 logmsgbot: mattflaschen Finished scap: Deploy changes to Flow to fix preview (both branches) and add commit metadata (1.25wmf13) (duration: 26m 33s)
23:09 logmsgbot: mattflaschen Started scap: Deploy changes to Flow to fix preview (both branches) and add commit metadata (1.25wmf13)
22:54 logmsgbot: demon Synchronized wmf-config/StartProfiler.php: fix profiling again, this time with feeling (duration: 00m 08s)
22:19 logmsgbot: ori Synchronized php-1.25wmf13/includes/parser/MWTidy.php: I7e67a61f7: Revert "Simplify MWTidy" (duration: 00m 05s)
22:19 logmsgbot: ori Synchronized php-1.25wmf12/includes/parser/MWTidy.php: I03cc1f46f: Revert "Simplify MWTidy" (duration: 00m 14s)
20:39 logmsgbot: bd808 Synchronized php-1.25wmf13/includes/profiler/ProfilerXhprof.php: xhprof: backport section profiler fixes (duration: 00m 07s)
20:14 logmsgbot: bd808 Synchronized php-1.25wmf13/tests/phpunit/includes/api/format/ApiFormatWddxTest.php: Skip ApiFormatWddxTest under HHVM (duration: 00m 07s)
19:11 logmsgbot: bd808 Synchronized php-1.25wmf12/includes/profiler/ProfilerXhprof.php: backport section profiler fixes [I5935ee2] (duration: 00m 05s)
18:52 logmsgbot: bd808 Synchronized php-1.25wmf12/includes/utils/IP.php: Log calls to IP::parseRange with invalid array argument [Ie883eb6] (duration: 00m 05s)
18:42 logmsgbot: bd808 Synchronized php-1.25wmf12/tests/phpunit/includes/api/format/ApiFormatWddxTest.php: syncing test fix Ia58ec20 (duration: 00m 06s)
18:25 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: logging for IP argument bugs (duration: 00m 05s)
18:09 bblack: analytics vlan ACLs updated in eqiad
17:59 logmsgbot: demon Synchronized wmf-config/: Disable lsearchd almost everywhere (duration: 00m 07s)
17:42 logmsgbot: demon Synchronized wmf-config/: Remove cirrus-as-alternate settings (duration: 00m 06s)
17:22 bblack: switched to new unified cert on all nginx terminators via config reload
17:18 godog: enabling md write intent bitmap temporarily on virt1009
17:08 hashar: Made mediawiki-phpunit-hhvm Jenkins job voting. We now enforce HHVM compliance for mediawiki/core
16:37 hashar: gallium deleting obsoletes jobs: rm -fR /srv/ssd/jenkins-slave/workspace/*-testextension . They are now suffixed with -zend and -hhvm
16:35 godog: deleting jenkins workspaces on lanthanum older than 30d
16:32 hashar: lanthanum deleting obsoletes jobs: rm -fR /srv/ssd/jenkins-slave/workspace/*-testextension . They are now suffixed with -zend and -hhvm
16:24 logmsgbot: demon Synchronized wmf-config/StartProfiler.php: disable normal eqiad profiling (duration: 00m 06s)
16:13 logmsgbot: manybubbles Synchronized php-1.25wmf12/extensions/MultimediaViewer/: SWAT backport last-modified performance logging for mediaviewer (duration: 00m 05s)
16:06 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: SWAT disable thumbnail caching (duration: 00m 05s)
16:06 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings-labs.php: SWAT disable thumbnail caching (duration: 00m 05s)
15:43 _joe_: rebooted mw1191
15:07 logmsgbot: demon Synchronized wmf-config/StartProfiler.php: disable xhprof-backed flame graphs for now (duration: 00m 05s)
14:57 _joe_: rebooting mw1257
14:44 _joe_: restarting hhvm on mw1191
14:01 qchris: EventLogging: deployed 937d804 & restarted EventLogging
13:54 akosiaris: purged unpuppetized rrdcached from hafnium. It was segfaulting when started via the init script, which led to the package being unconfigured which led to dpkg alerts on icinga
13:42 _joe|lunch: restarting hhvm on a few servers
13:11 _joe_: restarted hhvm on mw1242, stuck in getrusage()
13:03 _joe_: restarted hhvm on mw1191, load at 200
13:00 paravoid: salt-cleaning up /etc/sudoers.d/50_* (old naming scheme)
12:04 godog: upload carbon-c-relay 0.36+git20141218-1 to trusty-wikimedia
09:13 hashar: enabled MediaWiki core 'structure' PHPUnit tests for all extensions. Will require folks to fix their incorrect AutoLoader and RessourceLoader entries. 180496 bug T78798
06:23 _joe|justawake: restarted the puppetmaster on palladium
04:35 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1055 (duration: 04m 19s)
01:42 logmsgbot: ori Synchronized php-1.25wmf12/resources/src/mediawiki.action/mediawiki.action.edit.stash.js: Ibb29a825c: mediawiki.action.edit.stash: set timeout to 4 seconds (duration: 00m 05s)
01:31 awight: update crm from f1e558592ee98ff8fc84d19ff2c0435619e11242 to ae7b2381667dd65d68812c58f61e3ea66fa9fa6f
01:25 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1072 (duration: 00m 08s)
01:22 springle: mw1191 restarted hhvm, apparently stuck in futex
00:38 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1066, warm up (duration: 00m 05s)
00:24 hashar: Restarting Jenkins to remove a deadlock on deployment-bastion slave
00:19 logmsgbot: maxsem Synchronized php-1.25wmf13/extensions/Flow/: https://gerrit.wikimedia.org/r/#/c/180677/ (duration: 00m 08s)
00:01 logmsgbot: maxsem Synchronized wmf-config/: https://gerrit.wikimedia.org/r/#/c/180376/ - no-op for prod (duration: 00m 06s)

December 17

23:34 hashar: Restarted Jenkins and Zuul again to have a clean start while I am crashing to bed.
23:22 logmsgbot: demon Synchronized wmf-config/StartProfiler.php: xhprof on all hhvm hosts in eqiad (duration: 00m 05s)
22:46 hashar: restarting Jenkins
21:45 hashar: killing Jenkins
21:41 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.25wmf13
21:40 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias to 1.25wmf12
21:38 logmsgbot: reedy Finished scap: testwiki to 1.25wmf13 and build l10n cache (duration: 12m 26s)
21:25 logmsgbot: reedy Started scap: testwiki to 1.25wmf13 and build l10n cache
21:24 Reedy: @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @ for mw1152
20:36 hashar: Jenkins/Zuul had some deadlock. Disconnected/reconnected slaves but that did not fix it. Finally had to disconnect/reconnect thegearman client in Jenkins and it is processing again.
20:36 logmsgbot: reedy Started scap: testwiki to 1.25wmf13 and build l10n cache
20:12 hashar: Jenkins some slaves are no more properly registered. Unpooling / Repooling them
16:42 logmsgbot: demon Synchronized php-1.25wmf12/extensions/TextExtracts/: (no message) (duration: 00m 05s)
16:27 logmsgbot: demon Synchronized wmf-config/CirrusSearch-labs.php: for completeness (duration: 00m 05s)
16:20 ^d: mw1190: manually ran sync-common since it was yelling about my key earlier
16:14 logmsgbot: demon Synchronized php-1.25wmf12/includes/specials/SpecialSearch.php: (no message) (duration: 00m 06s)
16:10 logmsgbot: demon Synchronized php-1.25wmf12/extensions/Wikidata/: (no message) (duration: 00m 12s)
14:52 akosiaris: uploaded apertium-nno-nob_1.0.0+svn~57977-1 to apt.wikimedia.org
14:43 anomie: Merged and fetched gerrit:180477, so undeployed bad extension changes from gerrit:180229 are no longer a danger
13:47 akosiaris: uploaded apertium-nob_0.1.0+svn~58076-1 and apertium-nno_0.1.0+svn~58076-1 to apt.wikimedia.org
11:59 _joe_: removing some core dumps from appservers, so that we don't run out of space by tomorrow
11:52 Nemo_bis: Don't sync extensions, undeployed unintentional reverts https://wikitech.wikimedia.org/?diff=138472&oldid=138399
10:54 hashar: Jenkins deleting legacy 'mwext*testextension' jobs (now suffixed with '-zend') and restarting Jenkins.
10:40 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1064, warm up (duration: 00m 05s)
10:30 andrewbogott: virt1010 and 1011 are up but with puppet and nova-compute disabled pending firewall issues
10:29 hashar: mw1152 is a jobrunner being rebuild
10:26 hashar: mw1152 has a wrong host key in /etc/ssh/ssh_known_hosts:2480 causing scap to spurts a remote identification error.
10:26 logmsgbot: hashar Synchronized wmf-config/throttle.php: 180429 - Throttle rule for University of Haifa event (duration: 00m 06s)
10:25 logmsgbot: hashar Synchronized wmf-config/throttle.php: 180429 - Throttle rule for University of Haifa event (duration: 00m 06s)
09:57 _joe_: jobrunner started on mw1152
08:43 _joe_: depooling mw1152, reimaging as an HAT jobrunner
07:52 godog: increase minimum raid reconstruction speed on virt1005 and virt1009
06:52 springle: upgrade db1064 trusty
06:14 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1064 (duration: 00m 06s)
04:39 springle: mw1015 sync-common
04:21 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1066 (duration: 00m 05s)
03:11 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1073, warm up (duration: 00m 06s)
02:48 ori: restarted jobrunner on jobrunners
02:42 logmsgbot: ori Synchronized php-1.25wmf12/includes/parser/MWTidy.php: I4909e5e20: use stream_select() to get external tidy stdout/stderr (uncommitted; pending review) (duration: 00m 33s)
01:01 logmsgbot: maxsem Synchronized php-1.25wmf12/extensions/Wikidata/: https://gerrit.wikimedia.org/r/180368 (duration: 00m 59s)
00:59 logmsgbot: maxsem Synchronized php-1.25wmf12/extensions/Flow/: https://gerrit.wikimedia.org/r/#/c/180303/ (duration: 00m 41s)
00:56 logmsgbot: maxsem Synchronized php-1.25wmf12/includes/: https://gerrit.wikimedia.org/r/#/c/180214/ part 2 (duration: 01m 38s)
00:54 logmsgbot: maxsem Synchronized php-1.25wmf12/autoload.php: https://gerrit.wikimedia.org/r/#/c/180214/ part 1 (duration: 00m 26s)
00:51 logmsgbot: maxsem Synchronized wmf-config/: (no message) (duration: 00m 52s)
00:33 logmsgbot: maxsem Synchronized wmf-config/Wikibase.php: https://gerrit.wikimedia.org/r/179469 (duration: 01m 22s)
00:24 logmsgbot: tstarling Synchronized php-1.25wmf12/extensions/SecurePoll/includes/crypt/Crypt.php: tallying fix (duration: 01m 04s)
00:15 bblack: killed runJobs procs on mw1015 with init as parent

December 16

23:55 Tim: fixed MW cgroup on tin
23:47 logmsgbot: tstarling Synchronized wmf-config/CommonSettings.php: SecurePoll debugging (duration: 01m 01s)
23:32 logmsgbot: legoktm Synchronized wmf-config/CommonSettings.php: Temporarily disable wgCentralAuthAutoMigrate (duration: 01m 17s)
23:31 ori: disabled puppet on tin and removed mw1015 from mediawiki-installation dsh group
22:34 hoo: Updated the Wikidata property suggester with data from Monday's JSON dump
20:43 Jamesofur: inserted decryption key for English Wikipedia Arbitration Committee Election (2014)
20:35 twentyafterfour: spam
19:50 logmsgbot: twentyafterfour Synchronized wmf-config/Wikibase.php: (no message) (duration: 00m 05s)
19:41 logmsgbot: twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group1 to 1.25wmf12
19:19 Krinkle: Reloading Zuul to deploy Id2cfcdfd56220
19:18 awight: update crm from 28b68e23b670fe52a401659bde800b64d05e25bf to f1e558592ee98ff8fc84d19ff2c0435619e11242
19:15 logmsgbot: catrope Synchronized php-1.25wmf12/skins/Vector/: Revert watch star change (duration: 00m 05s)
17:30 legoktm: deleted apparently invalid timecorrection preference for user_id=68157 on simplewiki
16:44 logmsgbot: anomie Synchronized php-1.25wmf12/extensions/Translate: SWAT: Translate: Revert "Request csrf tokens in JS when supported" gerrit:180201 (duration: 01m 06s)
16:31 cmjohnson: removing mw1192 from pybal and disabling puppet for hardware troubleshooting
16:22 logmsgbot: anomie Synchronized php-1.25wmf12/extensions/Wikidata/: SWAT: extensions/Wikidata to 9d03a1df13ede425673da9ce57c440b59e867aa6 gerrit:180184 (duration: 00m 21s)
16:01 logmsgbot: anomie Synchronized wmf-config: SWAT: Enable $wgExtractsExtendOpenSearchXml gerrit:179168 (duration: 00m 07s)
15:58 _joe_: load test done, the apache appserver pool can work flawlessly with 110 servers in the pool
15:00 _joe_: depooling part of the apache appserver pool to assess current load
13:20 mutante: started lighttpd on sodium
09:31 godog: upgrade diamond in trusty/eqiad
09:29 godog: upgrade diamond in trusty/esams
08:42 godog: upgrading trusty/codfw to diamond 3.5-3
07:05 springle: upgrade db1073 trusty
06:39 springle: 06:32 < springle> !log sync-common on mw1043 after sync-file fail
06:38 springle: <+logmsgbot> !log springle Synchronized wmf-config/db-eqiad.php: depool db1073 (duration: 00m 06s)
03:46 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Dec 16 03:46:47 UTC 2014 (duration 46m 46s)
02:37 springle: upgrade db1055 trusty
02:24 logmsgbot: LocalisationUpdate completed (1.25wmf12) at 2014-12-16 02:23:58+00:00
02:23 logmsgbot: l10nupdate Synchronized php-1.25wmf12/cache/l10n: (no message) (duration: 00m 02s)
02:12 logmsgbot: LocalisationUpdate completed (1.25wmf11) at 2014-12-16 02:12:20+00:00
02:12 logmsgbot: l10nupdate Synchronized php-1.25wmf11/cache/l10n: (no message) (duration: 00m 01s)
02:06 logmsgbot: aude Finished scap: Update test.wikidata (duration: 18m 38s)
01:47 logmsgbot: aude Started scap: Update test.wikidata
01:36 logmsgbot: yurik Synchronized php-1.25wmf12/extensions/ZeroBanner: (no message) (duration: 00m 05s)
01:34 logmsgbot: yurik Synchronized php-1.25wmf11/extensions/ZeroBanner: (no message) (duration: 00m 09s)
01:30 logmsgbot: yurik Synchronized php-1.25wmf12/extensions/ZeroPortal: (no message) (duration: 00m 07s)
01:20 logmsgbot: yurik Synchronized php-1.25wmf11/extensions/ZeroBanner: (no message) (duration: 00m 06s)
01:15 logmsgbot: maxsem Finished scap: (no message) (duration: 26m 35s)
00:49 logmsgbot: maxsem Started scap: (no message)
00:37 logmsgbot: maxsem Synchronized php-1.25wmf12/extensions/Flow/: https://gerrit.wikimedia.org/r/#/c/180043/ (duration: 00m 08s)
00:34 awight: payments updated from f3fd79aaaf730f8fd18a72f83c11e9cc111a0aab to 3dde7be76284aa37b74038dfb4473671999dfcff
00:29 logmsgbot: maxsem Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/179981 (duration: 00m 07s)
00:12 logmsgbot: maxsem Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/179513 (duration: 00m 06s)

December 15

23:43 awight: update payments from 21afbcf24c0e2124f783cc3c2c65621569675d6f to f3fd79aaaf730f8fd18a72f83c11e9cc111a0aab
23:18 csteipp: deploy patch for T71209
23:18 csteipp: redeploy patches for T77624 & T76195
20:23 YuviPanda: cleaned out /var/log/wikidatadumps on snapshot1003 because hoo needs them anywhay?
18:27 logmsgbot: bd808 Synchronized wmf-config/CommonSettings.php: Set wgTranslateTranslationServices['TTMServer']['cutoff'] [I138b22a] (duration: 00m 07s)
18:10 logmsgbot: bd808 Synchronized wmf-config/InitialiseSettings.php: Sample the GlobalTitleFail log at 1:10000 [I280ac3d] (duration: 00m 07s)
17:37 logmsgbot: bd808 Synchronized wmf-config/InitialiseSettings.php: Enable MWLoggerMonologSpi for group0 wikis [I2f72f97] (duration: 00m 05s)
17:26 logmsgbot: bd808 Synchronized wmf-config/InitialiseSettings.php: Enable MWLoggerMonologSpi for testwiki [I419eb0d] (duration: 00m 05s)
17:23 logmsgbot: bd808 Synchronized docroot/noc/createTxtFileSymlinks.sh: Optional MWLoggerMonologSpi configuration [I720f2cb] (for real this time) (duration: 00m 06s)
17:22 logmsgbot: bd808 Synchronized wmf-config: Optional MWLoggerMonologSpi configuration [I720f2cb] (for real this time) (duration: 00m 06s)
17:14 logmsgbot: bd808 Synchronized docroot/noc/createTxtFileSymlinks.sh: Optional MWLoggerMonologSpi configuration [I720f2cb] (duration: 00m 05s)
17:13 logmsgbot: bd808 Synchronized wmf-config: Optional MWLoggerMonologSpi configuration [I720f2cb] (duration: 00m 05s)
17:10 logmsgbot: bd808 Synchronized wmf-config/InitialiseSettings.php: Introduce wmgUseMonologLogger feature flag [I61fa967] (duration: 00m 07s)
16:33 logmsgbot: marktraceur Synchronized php-1.25wmf12/extensions/UploadWizard/: [SWAT] [wmf12] Fix Flickr imports in UploadWizard (duration: 00m 05s)
16:30 logmsgbot: marktraceur Synchronized php-1.25wmf11/extensions/UploadWizard/: [SWAT] [wmf11] Fix Flickr imports in UploadWizard (duration: 00m 05s)
16:21 logmsgbot: marktraceur Synchronized php-1.25wmf11/extensions/MultimediaViewer/: [SWAT] [wmf11] - Track the most recent upload time for performance events (Media Viewer) (duration: 00m 05s)
16:12 logmsgbot: marktraceur Synchronized php-1.25wmf12/extensions/Wikidata/: [SWAT] [wmf12] - Update test.wikidata (fixes/polish for changes to the site link section, and performance improvements for page views). (duration: 00m 24s)
15:31 godog: upload diamond 3.5-3 to trusty-wikimedia
14:01 godog: reinstall python-twisted-bin python-twisted-core python-twisted-web on labmon1001
14:00 robh: zinc removed from icinga, system is now shutdown for reclaim per RT8939
13:50 robh: reclaiming zinc to spares, stopped puppet agent
13:13 akosiaris: uploaded hfst_3.8.1~r4088-1 to apt.wikimedia.org (trusty)
11:51 hashar: Zuul: clearing out some old zuul git references ( https://phabricator.wikimedia.org/T70481 ). Running in a screen on gallium
09:48 hashar: Upgrading composer on CI to v1.0.0-alpha9 178550
06:57 logmsgbot: ori Synchronized wmf-config/InitialiseSettings.php: Id9023e66c: Sample "api" debug log group at 1:1000 (duration: 00m 06s)
03:36 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Dec 15 03:36:24 UTC 2014 (duration 36m 23s)
02:15 logmsgbot: LocalisationUpdate completed (1.25wmf12) at 2014-12-15 02:15:01+00:00
02:15 logmsgbot: l10nupdate Synchronized php-1.25wmf12/cache/l10n: (no message) (duration: 00m 01s)
02:09 logmsgbot: LocalisationUpdate completed (1.25wmf11) at 2014-12-15 02:09:21+00:00
02:09 logmsgbot: l10nupdate Synchronized php-1.25wmf11/cache/l10n: (no message) (duration: 00m 02s)

December 14

03:33 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Dec 14 03:33:14 UTC 2014 (duration 33m 13s)
02:16 logmsgbot: LocalisationUpdate completed (1.25wmf12) at 2014-12-14 02:16:31+00:00
02:16 logmsgbot: l10nupdate Synchronized php-1.25wmf12/cache/l10n: (no message) (duration: 00m 01s)
02:11 logmsgbot: LocalisationUpdate completed (1.25wmf11) at 2014-12-14 02:11:43+00:00
02:11 logmsgbot: l10nupdate Synchronized php-1.25wmf11/cache/l10n: (no message) (duration: 00m 02s)

December 13

12:20 andrewbogott: graceful'd apache2 on virt1000; puppet master was acting up.
09:51 hashar: Restarting Jenkins to get rid of some deadlocks that occurred yesterday
03:29 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Dec 13 03:29:31 UTC 2014 (duration 29m 30s)
02:14 logmsgbot: LocalisationUpdate completed (1.25wmf12) at 2014-12-13 02:14:00+00:00
02:14 logmsgbot: l10nupdate Synchronized php-1.25wmf12/cache/l10n: (no message) (duration: 00m 01s)
02:09 logmsgbot: LocalisationUpdate completed (1.25wmf11) at 2014-12-13 02:09:28+00:00
02:09 logmsgbot: l10nupdate Synchronized php-1.25wmf11/cache/l10n: (no message) (duration: 00m 01s)
02:04 logmsgbot: krinkle Synchronized w/robots.php: 54746fdef3402 (duration: 00m 05s)
00:58 logmsgbot: krinkle Synchronized w/robots.php: 611892c62349d09c9758 (duration: 00m 06s)

December 12

22:06 logmsgbot: ori Synchronized wmf-config/StartProfiler.php: (no message) (duration: 00m 06s)
19:47 ottomata1: initiating kafka preferred-replica-election to bring analytics1021 back in to leadership :/ need to figure this out, or replace this node soon.
18:15 YuviPanda: ran sudo logrotate -f /etc/logrotate.d/dumpwikidatajson on snapshot1003 forhoo
18:13 logmsgbot: ori Synchronized wmf-config/StartProfiler.php: (no message) (duration: 00m 08s)
16:42 akosiaris: uploaded apertium-sv-da, apertium-en-ca to apt.wikimedia.org
15:13 hashar: Zuul Reverting Zuul back to wmf-deploy-20141030-4 . I previously reverted it to another change which was wrong.
14:59 hashar: Zuul status page is no more. https://phabricator.wikimedia.org/T78400
14:50 hashar: upgrading python-statsd on Zuul server and restarting service.
14:37 godog: upload python-statsd 3.0.1-1 to precise-wikimedia
14:13 godog: upload python-statsd 3.0.1-1 to trusty-wikimedia
11:43 YuviPanda: force puppet run on all labs hosts via salt
09:33 ori: restarted mwprof on tungsten
09:20 logmsgbot: ori Synchronized wmf-config/StartProfiler.php: I63864cc79: xenon log: collate stack samples and fold into single lines (duration: 00m 06s)
06:05 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1055 (duration: 00m 05s)
03:50 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Dec 12 03:50:10 UTC 2014 (duration 50m 9s)
03:28 logmsgbot: yurik Synchronized php-1.25wmf12/extensions/ZeroPortal/: updatidng ZeroPortal to master - urgent bugfix - retry (duration: 00m 10s)
03:27 logmsgbot: yurik Synchronized php-1.25wmf12/extensions/ZeroPortal/: updatidng ZeroPortal to master - urgent bugfix (duration: 00m 05s)
03:25 logmsgbot: ori Synchronized wmf-config: I1d218c2d6: Log xenon-captured traces via wfDebugLog (duration: 00m 06s)
02:55 andrewbogott: rebooted mw1041 from mgmt
02:20 logmsgbot: LocalisationUpdate completed (1.25wmf12) at 2014-12-12 02:20:37+00:00
02:20 logmsgbot: l10nupdate Synchronized php-1.25wmf12/cache/l10n: (no message) (duration: 00m 03s)
02:15 logmsgbot: LocalisationUpdate completed (1.25wmf11) at 2014-12-12 02:15:34+00:00
02:15 logmsgbot: l10nupdate Synchronized php-1.25wmf11/cache/l10n: (no message) (duration: 00m 03s)
00:41 logmsgbot: maxsem Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/179360 (duration: 00m 06s)
00:35 godog: profiler-to-carbon is logging too much on tungsten, cause unknown yet but don't restart
00:30 logmsgbot: maxsem Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/179359 (duration: 00m 11s)
00:25 logmsgbot: maxsem Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/179341/ (duration: 00m 05s)
00:17 logmsgbot: maxsem Synchronized php-1.25wmf11/extensions/MobileFrontend: (no message) (duration: 00m 06s)
00:17 logmsgbot: maxsem Synchronized php-1.25wmf11/extensions/WikiGrok/: (no message) (duration: 00m 06s)
00:15 logmsgbot: maxsem Synchronized php-1.25wmf12/extensions/WikiGrok/: (no message) (duration: 00m 06s)
00:15 logmsgbot: maxsem Synchronized php-1.25wmf12/extensions/MobileFrontend/: (no message) (duration: 00m 05s)
00:09 godog: stop profiler-to-carbon on tungsten

December 11

23:44 bd808: restarted logstash on logstash1001; fatalmonitor report was empty since ~20:30z
23:35 logmsgbot: bd808 Synchronized wmf-config: Revert Configure logging to use MWLoggerMonologSpi (Ib8ddd86) (duration: 00m 05s)
23:33 logmsgbot: bd808 Synchronized wmf-config: quick revert -- Configure logging to use MWLoggerMonologSpi (I99a032f) (duration: 00m 07s)
23:30 logmsgbot: bd808 Synchronized wmf-config: Configure logging to use MWLoggerMonologSpi (I99a032f) (duration: 00m 09s)
23:05 logmsgbot: maxsem Finished scap: i18n update for CentralNotice (duration: 29m 09s)
22:56 cscott: updated Parsoid to version d16dd2db
22:50 cscott: updated OCG to version bfc3812ef346c9f767135b339cedd123a1bcac98
22:45 hashar: Disconnected/reconnected the Jenkins Gearman client which unstuck Zuul magically.
22:42 hashar: Zuul stuck
22:36 logmsgbot: maxsem Started scap: i18n update for CentralNotice
21:47 hashar: Jenkins re adding integration-slave1009 to the pool of slaves
21:09 logmsgbot: LocalisationUpdate completed (1.25wmf12) at 2014-12-11 21:09:26+00:00
21:09 logmsgbot: awight Synchronized php-1.25wmf12/cache/l10n: (no message) (duration: 00m 01s)
20:58 logmsgbot: LocalisationUpdate completed (1.25wmf11) at 2014-12-11 20:58:24+00:00
20:58 logmsgbot: awight Synchronized php-1.25wmf11/cache/l10n: (no message) (duration: 00m 03s)
20:13 logmsgbot: awight Synchronized php-1.25wmf12/extensions/CentralNotice: IE fix for CentralNotice hide cookies (duration: 00m 06s)
20:13 logmsgbot: awight Synchronized php-1.25wmf11/extensions/CentralNotice: IE fix for CentralNotice hide cookies (duration: 00m 07s)
19:30 chrisjohnson: powering down tmh1002 to replace failed disk
19:21 legoktm: rescuing revisions on frwiki (https://phabricator.wikimedia.org/T76979)
18:41 godog: restart profiler-to-carbon on tungsten to pick up changes, including hhvm-profiler-to-carbon
18:20 logmsgbot: ori Synchronized php-1.25wmf12/resources/src/mediawiki.action/mediawiki.action.edit.stash.js: Ib2de3f15: Stash edit when user idles (duration: 00m 05s)
18:16 ejegg: updated dash from 08b078acf904d563030ff7a37b2af8df88387e29 to 6631a97e5e3e688bc0f4d2a1f6f5d97744dba0f4
17:41 ottomata: starting trusty upgrade of analytics1019
17:18 paravoid: restarting apache on strontium
17:06 _joe_: restarting HHVM on mw1237, stuck in HPHP::StatCache::refresh
16:59 godog: restarted gmond on ms-fe1001, all swift machines under this aggregator were showing offline
16:50 logmsgbot: marktraceur Synchronized wmf-config/: [SWAT] [config] Actually Revert 'Configure logging to use MWLoggerMonologSpi' (duration: 00m 10s)
16:44 logmsgbot: marktraceur Synchronized wmf-config/: [SWAT] [config] Revert Configure logging to use MWLoggerMonologSpi (duration: 00m 05s)
16:44 ottomata: starting trusty upgrade of analytics1011
16:43 logmsgbot: marktraceur Synchronized wmf-config/: [SWAT] [config] Configure logging to use MWLoggerMonologSpi (duration: 00m 07s)
16:28 logmsgbot: marktraceur Synchronized private/PrivateSettings.php: [SWAT] [config] Add password for logstash (duration: 00m 10s)
16:25 logmsgbot: marktraceur Synchronized php-1.25wmf12/extensions/WikimediaEvents/WikimediaEvents.php: [SWAT] [wmf12] Bump sendBeacon schema revision so new URL will be generated (duration: 00m 16s)
16:23 logmsgbot: marktraceur Synchronized php-1.25wmf11/extensions/WikimediaEvents/WikimediaEvents.php: [SWAT] [wmf11] Bump sendBeacon schema revision so new URL will be generated (duration: 00m 14s)
16:12 logmsgbot: marktraceur Synchronized wmf-config/InitialiseSettings.php: [SWAT] [config] Redisable WikiGrok on enwiki (duration: 00m 05s)
16:07 logmsgbot: marktraceur Synchronized wmf-config/InitialiseSettings.php: [SWAT] [config] Reenable WikiGrok on enwiki (duration: 00m 07s)
15:59 ottomata: starting trusty upgrade of analytics1033
15:04 hashar: @damons we love you!
15:01 hashar: saved Jenkins configuration via the web interface to reset the interface language from Chinese to English
13:31 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: pool db1004 in s7, warm up (duration: 00m 06s)
06:02 ori: restarted apache on palladium and strontium
04:08 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Dec 11 04:08:50 UTC 2014 (duration 8m 49s)
04:08 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Dec 11 04:08:33 UTC 2014 (duration 30m 22s)
03:34 logmsgbot: ori Synchronized php-1.25wmf11/extensions/Math: Ic438b307a3b46: Fix for fatal caused by static call to MathRenderer::getError (duration: 00m 06s)
02:57 Krinkle: git-deploy: Deploying integration/mediawiki-tools-codesniffer I602cb6cfe910fc0a
02:45 springle: xtrabackup clone db1007 to db1004
02:12 logmsgbot: LocalisationUpdate completed (1.25wmf12) at 2014-12-11 02:12:04+00:00
02:12 logmsgbot: l10nupdate Synchronized php-1.25wmf12/cache/l10n: (no message) (duration: 00m 01s)
02:10 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1015, warm up (duration: 00m 08s)
02:09 logmsgbot: LocalisationUpdate completed (1.25wmf12) at 2014-12-11 02:09:43+00:00
02:09 logmsgbot: yurik Synchronized php-1.25wmf12/cache/l10n: (no message) (duration: 00m 01s)
02:08 logmsgbot: LocalisationUpdate completed (1.25wmf11) at 2014-12-11 02:08:34+00:00
02:08 logmsgbot: l10nupdate Synchronized php-1.25wmf11/cache/l10n: (no message) (duration: 00m 01s)
01:57 springle: upgrade db1015 trusty
01:56 logmsgbot: LocalisationUpdate completed (1.25wmf11) at 2014-12-11 01:56:51+00:00
01:56 logmsgbot: yurik Synchronized php-1.25wmf11/cache/l10n: (no message) (duration: 00m 01s)
01:41 logmsgbot: maxsem Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/179036 (duration: 00m 06s)
01:38 csteipp: redeploy core fixes for wmf12
01:34 logmsgbot: maxsem Finished scap: Noop, regenerating l18n cache for ZeroBanner (duration: 33m 57s)
01:02 awight: update crm from 3d657972029ea221b321470102c99ad74027b6f7 to 28b68e23b670fe52a401659bde800b64d05e25bf
01:00 logmsgbot: maxsem Started scap: Noop, regenerating l18n cache for ZeroBanner
00:46 logmsgbot: maxsem Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/179028/ (duration: 00m 05s)
00:41 logmsgbot: maxsem Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/178990/ (duration: 00m 05s)
00:38 logmsgbot: maxsem Synchronized php-1.25wmf11/extensions/WikimediaEvents/: (no message) (duration: 00m 06s)
00:35 logmsgbot: maxsem Synchronized php-1.25wmf11/resources/Resources.php: https://gerrit.wikimedia.org/r/#/c/179014/ (duration: 00m 06s)
00:34 logmsgbot: maxsem Synchronized php-1.25wmf11/extensions/Flow/: https://gerrit.wikimedia.org/r/#q,179018,n,z (duration: 00m 07s)
00:33 logmsgbot: maxsem Synchronized php-1.25wmf11/extensions/Flow/: https://gerrit.wikimedia.org/r/#q,179020,n,z (duration: 00m 07s)
00:31 logmsgbot: maxsem Synchronized php-1.25wmf12/extensions/WikimediaEvents/: https://gerrit.wikimedia.org/r/#q,179018,n,z (duration: 00m 05s)
00:15 logmsgbot: maxsem Synchronized php-1.25wmf11/extensions/MobileFrontend/: (no message) (duration: 00m 06s)
00:13 logmsgbot: maxsem Synchronized php-1.25wmf12/extensions/MobileFrontend/: (no message) (duration: 00m 08s)
00:00 logmsgbot: yurik Finished scap: ZeroBanner had some i18n changes, plus bits seems to be out of sync for it (duration: 20m 01s)

December 10

23:40 logmsgbot: yurik Started scap: ZeroBanner had some i18n changes, plus bits seems to be out of sync for it
23:19 logmsgbot: yurik Synchronized php-1.25wmf12/extensions/ZeroPortal/: updatidng ZeroPortal to master (duration: 00m 06s)
23:19 logmsgbot: yurik Synchronized php-1.25wmf12/extensions/ZeroBanner/: updatidng ZeroBanner to master (duration: 00m 05s)
23:14 logmsgbot: yurik Synchronized php-1.25wmf11/extensions/ZeroBanner/: updatidng ZeroBanner to master (duration: 00m 06s)
22:38 logmsgbot: bd808 Synchronized wmf-config/logging-labs.php: Beta monolog config (I76d9953) (duration: 00m 05s)
21:20 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.25wmf12
21:16 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias to 1.25wmf11
21:15 Reedy: manually ran scap-rebuild-cdbs on mw1176
21:10 logmsgbot: reedy Finished scap: testwiki to 1.25wmf12 (duration: 43m 21s)
21:07 Reedy: got 21:05:27 sudo -u mwdeploy -n -- /srv/deployment/scap/scap/bin/scap-rebuild-cdbs on mw1176 returned [255]: Error reading response length from authentication socket. Permission denied (publickey). from mw1176
20:27 logmsgbot: reedy Started scap: testwiki to 1.25wmf12
18:32 cmjohnson: replacing disk slot 4 db1015
18:26 cmjohnson: replacing disk 0 db1010
16:53 godog: reinstalling graphite1001 as graphite1002
16:28 Coren: authdns-update to merge in https://gerrit.wikimedia.org/r/178860
16:23 godog: swapping sdm on ms-be2013 / ms-be2014 / ms-be2015
15:28 ottomata: initiated replica election since analytics1021 timed out zk connection again (I had hoped we were done with this :( )
15:05 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1037, warm up (duration: 00m 09s)
14:00 paravoid: running dpkg --remove-architecture i386 (trusty); rm /etc/dpkg/dpkg.cfg.d/multiarch (precise) across the whole fleet with the exception of gallium/lanthanum
11:46 _joe_: cleaning and vacuuming the HHVM cache on a few hosts
09:00 _joe_: cleaning and vacuuming the hhvm repo on mw1030
08:38 logmsgbot: ori Synchronized php-1.25wmf11/extensions/CommonsMetadata: (no message) (duration: 00m 07s)
08:10 logmsgbot: ori Synchronized php-1.25wmf11/extensions/CommonsMetadata/TemplateParser.php: Update CommonsMetadata for cherry-picks (duration: 00m 05s)
07:49 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1007 (duration: 00m 05s)
07:18 springle: upgrade db1007 trusty
05:19 springle: s6 xtrabackup clone db1015 to db1037
05:09 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1037 (duration: 00m 06s)
04:06 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Dec 10 04:06:39 UTC 2014 (duration 6m 38s)
02:26 logmsgbot: LocalisationUpdate completed (1.25wmf11) at 2014-12-10 02:26:38+00:00
02:26 logmsgbot: l10nupdate Synchronized php-1.25wmf11/cache/l10n: (no message) (duration: 00m 01s)
02:19 logmsgbot: LocalisationUpdate completed (1.25wmf10) at 2014-12-10 02:19:10+00:00
02:19 logmsgbot: l10nupdate Synchronized php-1.25wmf10/cache/l10n: (no message) (duration: 00m 03s)
01:59 ori: manually ran /etc/cron.daily/logrotate on fluorine
01:58 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1015 RT 9027 (duration: 00m 06s)
01:46 awight: update crm from d22dce0a375be3c5f32afc472fff550a5edf6a1e to 3d657972029ea221b321470102c99ad74027b6f7
01:31 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1039, warm up (duration: 00m 05s)
01:12 logmsgbot: ori Synchronized php-1.25wmf11/extensions/WikimediaEvents: Ie9ca5d3: Update WikimediaEvents for cherry-picks (duration: 00m 07s)
00:58 awight: update crm from 94997a37a6531f2f1d5074895d5fa2da947e03f0 to d22dce0a375be3c5f32afc472fff550a5edf6a1e
00:54 logmsgbot: catrope Synchronized php-1.25wmf11/extensions/VisualEditor: SWAT (duration: 00m 06s)
00:54 logmsgbot: catrope Synchronized php-1.25wmf11/extensions/WikimediaEvents: SWAT (duration: 00m 05s)
00:54 springle: upgrade db1039 trusty
00:47 logmsgbot: ori Synchronized php-1.25wmf10/extensions/Math: Ic438b307a3b46: Fix for fatal caused by static call to MathRenderer::getError (duration: 00m 07s)
00:42 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: Set $wgWMEStatsdBaseUri for WikimediaEvents / statsv (duration: 00m 07s)
00:34 logmsgbot: catrope Synchronized php-1.25wmf11/extensions/WikimediaEvents: SWAT: sendBeacon experiment (duration: 00m 05s)
00:34 logmsgbot: catrope Synchronized php-1.25wmf10/extensions/WikimediaEvents: SWAT: sendBeacon experiment (duration: 00m 06s)
00:18 K4-713: updated payments to 21afbcf24c0e2124f78
00:17 logmsgbot: catrope Synchronized php-1.25wmf11/includes/api/ApiOpenSearch.php: SwAT: fix empty LinkBatch in opensearch (duration: 00m 05s)
00:16 logmsgbot: catrope Synchronized wmf-config/InitialiseSettings.php: Subpages for Archive talk on officewiki (duration: 00m 06s)

December 9

23:56 logmsgbot: ebernhardson Synchronized wmf-config/InitialiseSettings.php: Flow enable the non NS_* talk namespaces (duration: 00m 07s)
23:38 logmsgbot: ebernhardson Synchronized wmf-config/: Disable LQT on officewiki (duration: 00m 05s)
23:33 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1039 (duration: 00m 06s)
23:04 logmsgbot: ebernhardson Synchronized wmf-config/InitialiseSettings.php: Flow enable the rest of officewiki talk namespaces (duration: 00m 09s)
22:57 logmsgbot: ori Synchronized php-1.25wmf11/includes/api/ApiStashEdit.php: (no message) (duration: 00m 05s)
22:55 logmsgbot: ebernhardson Synchronized wmf-config/InitialiseSettings.php: Flow enable category talk namespace on officewiki (duration: 00m 08s)
22:53 logmsgbot: ebernhardson Synchronized php-1.25wmf11/extensions/Flow: Bump flow in 1.25wmf11 for officewiki import fixes (duration: 00m 07s)
22:49 logmsgbot: ebernhardson Synchronized wmf-config/InitialiseSettings.php: Flow enable 4 pages on cawiki (duration: 00m 05s)
22:34 logmsgbot: aaron Synchronized php-1.25wmf11/includes/page/WikiPage.php: dff1662755d828675e5ae119b1987ace10865693 (duration: 00m 06s)
22:33 logmsgbot: aaron Synchronized php-1.25wmf11/includes/api/ApiStashEdit.php: dff1662755d828675e5ae119b1987ace10865693 (duration: 00m 06s)
22:27 logmsgbot: ebernhardson Synchronized wmf-config/InitialiseSettings.php: enable flow on three namespaces on officewiki (duration: 00m 06s)
22:24 logmsgbot: ebernhardson Synchronized wmf-config/InitialiseSettings.php: enable flow on three namespaces on officewiki (duration: 00m 05s)
22:20 logmsgbot: ori Synchronized php-1.25wmf11/includes/page/WikiPage.php: undo: (hack) $useCache = true (duration: 00m 07s)
22:18 logmsgbot: ori Synchronized php-1.25wmf11/includes/page/WikiPage.php: (hack) $useCache = true (duration: 00m 06s)
22:17 logmsgbot: ebernhardson Synchronized php-1.25wmf11/extensions/Flow/: Push flow updates for officewiki deploy (duration: 00m 08s)
22:11 logmsgbot: ebernhardson Synchronized wmf-config/InitialiseSettings.php: enable flow on cawiki (duration: 00m 06s)
22:01 logmsgbot: ori Synchronized php-1.25wmf11/includes/api/ApiStashEdit.php: I5c296325: Various edit stash fixes (duration: 00m 06s)
20:53 legoktm: ran update revision set rev_page="8555535" where rev_page="6628330"; on frwiki
20:48 legoktm: ran update revision set rev_page="8555529" where rev_page="1469156"; on frwiki (for T76979)
20:46 YuviPanda: started /usr/local/bin/dumpwikidatajson.sh on snapshot1003 per hoo, after killing php processes from earlier start as well as from the earlier botched kill
20:44 ottomata: renaming all webrequest varnishkafka instances
20:37 YuviPanda: started /usr/local/bin/dumpwikidatajson.sh on snapshot1003 per hoo, to re-start dump script aborted earlier
20:33 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 06s)
20:28 logmsgbot: reedy Synchronized multiversion/: cdb bump (duration: 00m 05s)
20:25 Reedy: ran sync-common on mw1203
20:24 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: non wikipedias to 1.25wmf11
20:20 _joe_: repooled the last api servers
20:18 Coren: gave a+r to /etc/ssh/ssh_known_hosts on tin and iron
20:15 Reedy: mw1203 seems to be down
20:14 logmsgbot: reedy Synchronized wmf-config/CommonSettings.php: touch (duration: 01m 07s)
20:12 logmsgbot: reedy Synchronized wmf-config/CommonSettings.php: touch (duration: 01m 03s)
20:00 awight: update crm from 77e99a530b7c3910ca521923d97830df08a4d1b1 to 94997a37a6531f2f1d5074895d5fa2da947e03f0
19:25 Reedy: that is a lie, 266 hosts failed
19:24 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: non wikipedias to 1.25wmf11
18:21 _joe_: api 100% on HHVM now
18:21 _joe_: depooling mw1207-8, repooling mw1203-06
17:31 csteipp: patched for T77624
17:27 logmsgbot: csteipp Synchronized php-1.25wmf11/extensions/Listings/Listings.body.php: (no message) (duration: 00m 07s)
17:07 godog: powercycle ms-be1012, no console
16:42 _joe_: depooling mw1201-04
16:41 _joe_: repooling mw1194-1200
16:34 logmsgbot: anomie Synchronized wmf-config/CommonSettings-labs.php: Deploy some Labs-only changes so they're not showing as undeployed (duration: 00m 05s)
16:31 logmsgbot: anomie Synchronized php-1.25wmf11/extensions/Wikidata/: SWAT: Fix issue with json dump and sites caching in Wikidata gerrit:178533 (duration: 00m 15s)
16:29 logmsgbot: anomie Synchronized php-1.25wmf10/includes/filerepo/file/File.php: SWAT: Fix for broken thumbnails when the file width is in $wgThumbnailBucket gerrit:178529 (duration: 01m 04s)
16:18 logmsgbot: anomie Synchronized php-1.25wmf11/includes/filerepo/file/File.php: SWAT: Fix for broken thumbnails when the file width is in $wgThumbnailBucket gerrit:178531 (duration: 00m 08s)
15:36 qchris: restarted EventLogging's m2 writer on vanadium. Events did not get written into the database.
15:22 _joe_: repooling mw1190-93, depooling mw1194-1200
14:59 _joe_: repooled mw1147-48
14:42 YuviPanda: killed wikidata dump process (/usr/local/bin/dumpwikidatajson.sh) per hoo
13:59 _joe_: repooling mw1140-46, depooling mw114[78], mw119[0-3]
13:00 _joe_: repool mw1133-39, depooling mw1140-46
12:01 _joe_: repooling mw1125-1132, depooling mw1133-39
11:30 godog: restarting diamond on trusty hosts via salt
11:19 Reedy: [10:43:54] <_joe_> !log repooling mw1120-25, depooling mw1126-32
11:15 springle: repool db1010, warm up
11:15 springle: kicked morebots
08:13 _joe_: depooling mw1115-1119 from the api pool, reimaging
08:06 _joe_: restarting diamond on all appservers
06:01 logmsgbot: ori Synchronized php-1.25wmf11/extensions/Math/MathInputCheckTexvc.php: Fix for fatal caused by static call to MathRenderer::getError (duration: 00m 06s)
06:01 logmsgbot: ori Synchronized php-1.25wmf10/extensions/Math/MathInputCheckTexvc.php: Fix for fatal caused by static call to MathRenderer::getError (duration: 00m 06s)
04:27 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Dec 9 04:27:31 UTC 2014 (duration 27m 30s)
04:18 springle: upgrade db1010 trusty
03:33 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1010 (duration: 00m 08s)
02:23 logmsgbot: LocalisationUpdate completed (1.25wmf11) at 2014-12-09 02:23:35+00:00
02:23 logmsgbot: l10nupdate Synchronized php-1.25wmf11/cache/l10n: (no message) (duration: 00m 01s)
02:17 logmsgbot: LocalisationUpdate completed (1.25wmf10) at 2014-12-09 02:17:11+00:00
02:17 logmsgbot: l10nupdate Synchronized php-1.25wmf10/cache/l10n: (no message) (duration: 00m 03s)
01:49 logmsgbot: ori Synchronized wmf-config/InitialiseSettings.php: Set $wgAjaxEditStash to false while API cluster is on Zend (duration: 00m 06s)
01:24 K4-713: updated payments to 0e92713c0d6e5
00:13 logmsgbot: maxsem Synchronized php-1.25wmf10/extensions/Wikidata/: https://gerrit.wikimedia.org/r/#q,178374,n,z (duration: 00m 13s)
00:12 logmsgbot: maxsem Synchronized php-1.25wmf10/extensions/VisualEditor/: https://gerrit.wikimedia.org/r/#q,178371,n,z (duration: 00m 07s)
00:09 logmsgbot: maxsem Synchronized php-1.25wmf11/extensions/Wikidata/: https://gerrit.wikimedia.org/r/#q,178375,n,z (duration: 00m 13s)
00:09 logmsgbot: maxsem Synchronized php-1.25wmf11/extensions/VisualEditor/: https://gerrit.wikimedia.org/r/#q,178372,n,z (duration: 00m 07s)
00:08 logmsgbot: maxsem Synchronized php-1.25wmf11/extensions/MobileFrontend/: https://gerrit.wikimedia.org/r/#q,177942,n,z (duration: 00m 08s)
00:02 logmsgbot: maxsem Synchronized wmf-config/: https://gerrit.wikimedia.org/r/178240 (duration: 00m 05s)

December 8

23:15 K4-713: revlocked payments wiki to 30f15865bc4efe3b2b
22:09 awight: update crm from c9b733f0963a04ab1174ede0d5641e9b884747c8 to 77e99a530b7c3910ca521923d97830df08a4d1b1
21:52 awight: updated tools from 06e69f0bd1a1f74eb8055f5300b48ad3b78eedea to 88b57fea517d2232e8ae906df550f426b6574f24
20:20 awight: updated crm adfbbecbf949932932a3b6bc8c20c15e2a8054b2 to c9b733f0963a04ab1174ede0d5641e9b884747c8
19:40 logmsgbot: krinkle Synchronized php-1.25wmf11/resources/src/startup.js: touch for T47877 (duration: 00m 06s)
19:19 csteipp: deployed patches for T77028 and T76686
19:13 logmsgbot: ori Finished scap: I5a7e258d2: Optimize how user options are delivered to the client (duration: 26m 45s)
18:46 logmsgbot: ori Started scap: I5a7e258d2: Optimize how user options are delivered to the client
18:04 YuviPanda: removed restbase/ from graphite for T77172 on tungsten
17:54 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: Unset \$wgTidyInternal (duration: 00m 07s)
17:52 manybubbles: rebuilding eswiki's cirrus index to pick up fix for slow prefix searches
17:02 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: enable tidy extension on mw1081 (duration: 00m 06s)
16:56 ottomata: doing controlled restart of kafka broker analytics1021, and then initiating replica election to bring it back into leadership
16:21 hashar: Jenkins: disconnected / reconnected gallium slave from the web interface. It was locked not being able to run the mediawiki/vagrant postmerge doc job
16:08 logmsgbot: demon Synchronized php-1.25wmf11/extensions/VisualEditor: (no message) (duration: 00m 06s)
16:08 logmsgbot: demon Synchronized php-1.25wmf10/extensions/VisualEditor: (no message) (duration: 00m 09s)
15:03 hashar: Broke zuul-cloner by mistake
14:36 godog: reboot graphite1001 for kernel upgrade
13:11 hashar: Restarting zuul and zuul-merger on gallium
13:10 hashar: Zuul: rebasing our fork to bring some upstream changes
10:41 godog: upload diamond 3.5-2 to trusty-wikimedia
09:18 andrewbogott: the failure looked like this: "Unexpected error in mod_passenger: Could not connect to the ApplicationPool server: Broken pipe (32)"
09:18 andrewbogott: graceful'd apache on virt1000 -- resolving a mysterious puppetmaster outage
05:08 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1049, warm up (duration: 00m 06s)
04:41 springle: upgrade db1049 trusty
04:17 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1049 (duration: 00m 06s)
04:16 springle: also truncated puppet fact_values, see http://projects.puppetlabs.com/issues/9225 and https://tickets.puppetlabs.com/browse/PUP-1173
04:11 springle: puppet fact_values hit auto_inc limit. altered table to restart from 1 to get puppet running (seems safe, but needs checking, maybe also truncate)
03:42 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Dec 8 03:42:28 UTC 2014 (duration 42m 27s)
02:15 logmsgbot: LocalisationUpdate completed (1.25wmf11) at 2014-12-08 02:15:53+00:00
02:15 logmsgbot: l10nupdate Synchronized php-1.25wmf11/cache/l10n: (no message) (duration: 00m 01s)
02:10 logmsgbot: LocalisationUpdate completed (1.25wmf10) at 2014-12-08 02:10:20+00:00
02:10 logmsgbot: l10nupdate Synchronized php-1.25wmf10/cache/l10n: (no message) (duration: 00m 03s)

December 7

03:41 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Dec 7 03:41:52 UTC 2014 (duration 41m 51s)
02:16 logmsgbot: LocalisationUpdate completed (1.25wmf11) at 2014-12-07 02:16:30+00:00
02:16 logmsgbot: l10nupdate Synchronized php-1.25wmf11/cache/l10n: (no message) (duration: 00m 02s)
02:11 logmsgbot: LocalisationUpdate completed (1.25wmf10) at 2014-12-07 02:11:07+00:00
02:11 logmsgbot: l10nupdate Synchronized php-1.25wmf10/cache/l10n: (no message) (duration: 00m 01s)

December 6

22:51 ori: restarted apache on palladium
19:48 Krinkle: Made trivial edit to Jenkins language config to purge the French invasion (default language: en-us -> en-US)
19:39 Krinkle: Jenkins has been conquered by the French again
03:42 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Dec 6 03:42:37 UTC 2014 (duration 42m 36s)
02:18 logmsgbot: LocalisationUpdate completed (1.25wmf11) at 2014-12-06 02:18:52+00:00
02:18 logmsgbot: l10nupdate Synchronized php-1.25wmf11/cache/l10n: (no message) (duration: 00m 01s)
02:13 logmsgbot: LocalisationUpdate completed (1.25wmf10) at 2014-12-06 02:13:12+00:00
02:13 logmsgbot: l10nupdate Synchronized php-1.25wmf10/cache/l10n: (no message) (duration: 00m 01s)
01:00 logmsgbot: demon Synchronized w/robots.php: better mtime (duration: 00m 06s)
00:49 logmsgbot: ori Synchronized wmf-config/InitialiseSettings.php: Add preview log group (duration: 00m 06s)

December 5

18:22 YuviPanda: restart gitblit on antimony
18:03 akosiaris: uploaded apertium-eo-en, apertium-id-ms to apt.wikimedia.org
16:44 ottomata: rebooting analytics1021
15:51 mutante: hack-fixed http://noc.wikimedia.org/db.php
15:46 _joe_: repooling all the servers
15:06 _joe_: depooling mw1209-1220, enabling hyperthreading and upgrading
14:11 paravoid: rebooting copper
13:01 _joe_: repooling all appservers
12:32 Krinkle: Reloading Zuul to deploy I9515542a1ac2ff
11:48 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1059, warm up (duration: 00m 07s)
11:46 _joe_: repooled mw1161-mw1170, depooling mw1171-80
10:58 springle: upgrade db1059 trusty
10:56 _joe_: depooling mw1161-mw1170 for enabling hyperthreading
10:42 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1059 (duration: 00m 06s)
10:24 _joe_: repooling the appservers
09:21 _joe_: depooling some appservers for maintenance/upgrades
06:56 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1027, warm up (duration: 00m 05s)
05:02 springle: upgrade db1027 trusty
04:43 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1027 (duration: 00m 07s)
04:14 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Dec 5 04:14:34 UTC 2014 (duration 14m 33s)
03:30 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1060, warm up (duration: 00m 06s)
02:55 springle: upgrade db1060 trusty
02:35 springle: manual sync-common on mw1203 (after apparently transient sync-file network error on tin)
02:26 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1060 (duration: 00m 08s)
02:20 logmsgbot: LocalisationUpdate completed (1.25wmf11) at 2014-12-05 02:20:57+00:00
02:20 logmsgbot: l10nupdate Synchronized php-1.25wmf11/cache/l10n: (no message) (duration: 00m 01s)
02:16 logmsgbot: LocalisationUpdate completed (1.25wmf10) at 2014-12-05 02:16:49+00:00
02:16 logmsgbot: l10nupdate Synchronized php-1.25wmf10/cache/l10n: (no message) (duration: 00m 01s)
02:01 logmsgbot: awight Synchronized php-1.25wmf11/extensions/CentralNotice: rollback Googlebot cloaking (duration: 00m 08s)
02:01 logmsgbot: awight Synchronized php-1.25wmf10/extensions/CentralNotice: rollback Googlebot cloaking (duration: 00m 05s)
01:31 logmsgbot: awight Synchronized php-1.25wmf10/extensions/CentralNotice: rollback CentralNotice 'improvement' (duration: 00m 05s)
01:31 logmsgbot: awight Synchronized php-1.25wmf11/extensions/CentralNotice: rollback CentralNotice 'improvement' (duration: 00m 09s)
00:42 logmsgbot: maxsem Synchronized search-redirect.php: Second attempt (duration: 00m 05s)
00:42 logmsgbot: maxsem Synchronized search-redirect.php: https://gerrit.wikimedia.org/r/#/c/177665/ (duration: 00m 06s)
00:35 logmsgbot: maxsem Synchronized wmf-config/: https://gerrit.wikimedia.org/r/#/c/177708/ (duration: 00m 07s)
00:32 logmsgbot: maxsem Synchronized wmf-config/: https://gerrit.wikimedia.org/r/#/c/177600/ (duration: 00m 06s)
00:28 logmsgbot: maxsem Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/173083/ (duration: 00m 05s)
00:22 logmsgbot: maxsem Synchronized wmf-config/: https://gerrit.wikimedia.org/r/#/c/177660/ (duration: 00m 06s)
00:17 logmsgbot: maxsem Synchronized wmf-config/: https://gerrit.wikimedia.org/r/#/c/177494/ (duration: 00m 07s)
00:09 logmsgbot: maxsem Synchronized php-1.25wmf10/extensions/Wikidata/: https://gerrit.wikimedia.org/r/#q,177693,n,z (duration: 00m 11s)
00:09 logmsgbot: maxsem Synchronized php-1.25wmf11/extensions/Wikidata/: https://gerrit.wikimedia.org/r/#q,177693,n,z (duration: 00m 14s)
00:04 logmsgbot: maxsem Synchronized php-1.25wmf11/extensions/VisualEditor/: https://gerrit.wikimedia.org/r/#q,177643,n,z (duration: 00m 07s)

December 4

22:36 logmsgbot: awight Synchronized php-1.25wmf11/extensions/CentralNotice: push CentralNotice features (duration: 00m 06s)
22:35 logmsgbot: awight Synchronized php-1.25wmf10/extensions/CentralNotice: push CentralNotice features (duration: 00m 08s)
22:01 YuviPanda: nodejs sucks.
21:36 logmsgbot: awight Synchronized wmf-config: Shortening CentralNotice close box to 1 week (duration: 00m 07s)
21:16 awight: pushed CentralNotice fixes to hide from Google
21:16 logmsgbot: awight Synchronized php-1.25wmf11/extensions/CentralNotice: Hide CentralNotice banners from Google (duration: 00m 07s)
21:16 logmsgbot: awight Synchronized php-1.25wmf10/extensions/CentralNotice: Hide CentralNotice banners from Google (duration: 00m 06s)
21:08 ori: restarted ganglia-monitor on mw1081
20:56 legoktm: removeInvalidEmails.php finished, removed a total of 218,598 emails
20:17 logmsgbot: awight Synchronized robots.txt: Disallow banner stuff in robots.txt (duration: 00m 07s)
19:16 legoktm: running removeInvalidEmails.php across all wikis
18:57 logmsgbot: demon Synchronized wmf-config/StartProfiler.php: (no message) (duration: 00m 05s)
18:33 legoktm: ran removeInvalidEmails.php on testwiki
18:28 bd808: Rolling restart of the elasticsearch cluster for logstash did not fix corrupted logstash-2014.11.30 index. It was worth a shot.
18:23 bd808: restarted elasticsearch on logstash1002
17:28 bd808: restarted elasticsearch on logstash1003 to see if the missing indices would recover
16:28 bd808: restarted elasticsearch on logstash1001.
14:40 godog: upgrade to diamond 3.5 on trusty hosts in esams
14:31 godog: upgrade to diamond 3.5 on trusty hosts in esams
14:14 hashar: Jenkins: haven't had to restart it, I cancelled a few jobs and it went back up processing jobs..
14:12 hashar: Jenkins in deadlock , restarting it ( https://phabricator.wikimedia.org/T72597 )
13:45 _joe_: repooling mw1081
13:10 _joe_: depooling mw1081 to activate hyperthreading
12:09 paravoid: powercycling rhenium, kernel locked up
11:24 godog: upgrade to diamond 3.5 on trusty hosts in ulsfo
10:44 godog: upgrade diamond to 3.5 on all trusty machines in codfw
10:33 godog: test-upgrade diamond 3.5 on swift in codfw
10:01 ori: disabling puppet on mw1081 and restarting hhvm with hhvm.server.stat_cache =true to observe impact
09:47 godog: upload diamond 3.5-1wmf1 to trusty-wikimedia
08:12 akosiaris: disable puppet on carbon. Playing with partman :)
05:13 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: confirmedit disabled on closed wikis (duration: 00m 05s)
04:23 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Dec 4 04:23:26 UTC 2014 (duration 23m 25s)
04:01 Tim: on mw1189: restarting hhvm
02:55 logmsgbot: krinkle Synchronized w/: Ifbfb7dfd8fc0cd822b0 and I6594bc82b9de (duration: 00m 05s)
02:32 logmsgbot: LocalisationUpdate completed (1.25wmf11) at 2014-12-04 02:32:18+00:00
02:32 logmsgbot: l10nupdate Synchronized php-1.25wmf11/cache/l10n: (no message) (duration: 00m 01s)
02:20 logmsgbot: LocalisationUpdate completed (1.25wmf10) at 2014-12-04 02:20:47+00:00
02:20 logmsgbot: l10nupdate Synchronized php-1.25wmf10/cache/l10n: (no message) (duration: 00m 01s)
02:09 springle: labsdb1001 upgrade & reboot
02:05 awight: update crm from cd936bb433e9f107d860fb6e3da44c2ca2cb7742 to adfbbecbf949932932a3b6bc8c20c15e2a8054b2
01:56 springle: labsdb1002 upgrade & reboot
01:26 springle: labsdb1003 upgrade & reboot
00:40 logmsgbot: maxsem Synchronized php-1.25wmf11/extensions/Wikidata/: (no message) (duration: 00m 13s)
00:39 logmsgbot: maxsem Synchronized php-1.25wmf10/extensions/Wikidata/: (no message) (duration: 00m 12s)
00:25 logmsgbot: maxsem Synchronized php-1.25wmf11/maintenance/removeInvalidEmails.php: https://gerrit.wikimedia.org/r/#/c/177021/ (duration: 00m 05s)
00:25 logmsgbot: maxsem Synchronized php-1.25wmf10/maintenance/removeInvalidEmails.php: https://gerrit.wikimedia.org/r/#/c/177021/ (duration: 00m 05s)
00:21 logmsgbot: maxsem Synchronized wmf-config/: https://gerrit.wikimedia.org/r/#/c/177271/ (duration: 00m 07s)
00:11 logmsgbot: maxsem Synchronized php-1.25wmf11/extensions/WikiGrok/: https://gerrit.wikimedia.org/r/177393 (duration: 00m 05s)
00:10 logmsgbot: maxsem Synchronized php-1.25wmf10/extensions/WikiGrok/: https://gerrit.wikimedia.org/r/177393 (duration: 00m 08s)
00:04 logmsgbot: maxsem Synchronized wmf-config/: https://gerrit.wikimedia.org/r/#/c/177153/ , noop in production (duration: 00m 07s)

December 3

23:35 logmsgbot: hoo Synchronized wmf-config/Wikibase.php: Don't lookup Sites from mc for the 'languageLinkSiteGroup' setting (duration: 00m 06s)
23:19 logmsgbot: ebernhardson Synchronized php-1.25wmf10/extensions/Flow/includes/Parsoid/: (no message) (duration: 00m 05s)
23:14 K4-713: updated localsettings on payments
22:49 K4-713: localsettings change for payments-wiki-staging
22:25 YuviPanda: repooled mw1177
22:22 cscott: updated Parsoid to version 733986a6
22:11 logmsgbot: ebernhardson Synchronized wmf-config/: Flow enable NS_PROJECT_TALK on officewiki (duration: 00m 07s)
22:04 logmsgbot: demon Synchronized wmf-config/StartProfiler.php: xhprof & such (duration: 00m 05s)
21:49 cscott: updated OCG to version 08e94b19c3f17e699d7e53d9605f65c58e17ea0e
21:25 logmsgbot: reedy Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 05s)
21:02 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 07s)
20:49 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.25wmf11
20:49 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias to 1.25wmf10
20:46 logmsgbot: reedy Synchronized php-1.25wmf11/extensions/Wikidata: (no message) (duration: 00m 12s)
20:46 logmsgbot: reedy Synchronized php-1.25wmf10/extensions/Wikidata: (no message) (duration: 00m 13s)
20:43 logmsgbot: reedy Finished scap: testwiki to 1.25wmf11 (take 2) (duration: 36m 35s)
20:07 logmsgbot: reedy Started scap: testwiki to 1.25wmf11 (take 2)
19:58 logmsgbot: reedy Started scap: testwiki to 1.25wmf11 and rebuild l10n caches
18:58 YuviPanda: manually killed parsoid on wtp1017, restarted with service parsoid restart
18:57 YuviPanda: manually killed parsoid on wtp1009, restarted with service parsoid restart
18:29 subbu: restarted parsoid to clear any cached v2 api state to prevent leakage into v1 api requests
17:06 _joe_: repooling mw1108-1113
16:53 YuviPanda: repooling mw1183 mw1172 mw1170 as hhvm
16:48 ottomata: starting upgrade of analytics1027 to trusty, hive and oozie are offline for a bit
16:41 ottomata: starting trusty upgrade of analytics1027
16:41 logmsgbot: anomie Synchronized w/robots.php: Committed live hack (for real this time) (duration: 00m 05s)
16:37 logmsgbot: anomie Synchronized w/robots.php: Committed live hack (duration: 00m 05s)
16:35 logmsgbot: anomie Synchronized w/robots.php: Remove Content-Length from robots.txt (live hack for test, will commit or revert momentarily) (duration: 00m 07s)
16:27 YuviPanda: repool mw1166 mw1165 mw1164 mw1162 mw1161
16:19 logmsgbot: anomie Synchronized w/robots.php: Fix Content-Length from robots.txt (duration: 00m 06s)
15:28 YuviPanda: depooling mw1161-62 for re-imaging
15:20 akosiaris: rebooting mw1054 for kernel upgrade
15:18 _joe_: repooling mw1101-1107, depooling mw1108-1113
15:14 YuviPanda: depool mw1164-66
15:12 akosiaris: reimaging mw1149-1152
14:47 YuviPanda: repooled mw1173 mw1171 mw1169 mw1168 mw1167
14:28 akosiaris: reimaging mw1054
14:07 _joe_: depooling mw1101-mw1107
13:51 YuviPanda: depooling mw1167 for re-imaging
13:49 YuviPanda: depooling mw1168 for re-imaging
13:13 YuviPanda: depool mw1170-3 for re-imaging
13:10 YuviPanda: depool mw1177 for hhvm re-imaging
11:24 akosiaris: reimaging mw1054-mw1059
10:52 _joe_: repooling mw1088-1094, depooling mw1095-1100
09:39 _joe_: repooling mw1045, depooling 1088-1094
09:31 _joe_: repooled mw1081-1087
09:24 YuviPanda: depooled mw1045 for _joe_
09:24 YuviPanda: repooled mw1174-6,8,9, 80,81
09:01 YuviPanda: repooling mw1182
08:30 YuviPanda: depooling mw1174-mw1176
08:27 YuviPanda: depooling mw1178 for re-imaging
08:27 _joe_: depooling mw1081-1087
08:24 YuviPanda: depooling mw1179 for re-imaging
08:19 YuviPanda: depooling mw1180 for re-imaging
08:14 YuviPanda: depooling mw1181 for re-imaging
08:02 YuviPanda: depooling mw1182 for re-imaging
07:59 YuviPanda: depooling mw1183 for re-imaging
06:41 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: db1072 full load (duration: 00m 06s)
06:31 Krinkle: Reloading Zuul to deploy If499fe06e0392f4046f97f5633c08ba442649ec5
04:22 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Dec 3 04:22:41 UTC 2014 (duration 22m 40s)
03:40 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1072, warm up (duration: 00m 10s)
03:13 springle: upgrade db1072 trusty
02:32 logmsgbot: LocalisationUpdate completed (1.25wmf10) at 2014-12-03 02:32:29+00:00
02:32 logmsgbot: l10nupdate Synchronized php-1.25wmf10/cache/l10n: (no message) (duration: 00m 01s)
02:22 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1072 (duration: 00m 08s)
02:19 logmsgbot: LocalisationUpdate completed (1.25wmf9) at 2014-12-03 02:19:21+00:00
02:19 logmsgbot: l10nupdate Synchronized php-1.25wmf9/cache/l10n: (no message) (duration: 00m 02s)
00:52 logmsgbot: yurik Synchronized php-1.25wmf10/extensions/ZeroPortal: updatidng ZeroPortal to master (duration: 00m 07s)
00:34 logmsgbot: ebernhardson Synchronized wmf-config/: Turning on wgWikiGrokDebug on en BetaLabs (duration: 00m 06s)
00:32 logmsgbot: ebernhardson Synchronized wmf-config/PoolCounterSettings-eqiad.php: Create new pool counter for prefix searches (duration: 00m 05s)
00:31 ori: restarted apache2 on palladium
00:30 logmsgbot: ebernhardson Finished scap: Bumping flow submodule in 1.25wmf10 (duration: 38m 55s)
00:15 ejegg: updated tools from 0a2c365455d417b21f4ebccaf0e5e3fc5bdb887f to 06e69f0bd1a1f74eb8055f5300b48ad3b78eedea

December 2

23:54 ejegg: updated tool from 113dfe160b750657626e07450003cc88d3939fbd to c8f63baf134e57680fd255874455d52efb70596f
23:51 logmsgbot: ebernhardson Started scap: Bumping flow submodule in 1.25wmf10
23:32 ejegg: updated crm from 68703898b7ebfb2a038f307f17788739114806e4 to cd936bb433e9f107d860fb6e3da44c2ca2cb7742
22:45 YuviPanda: repooling mw118[4-7] as HHVM!
22:39 _joe_: likewise on mw1121, mw1200
22:34 _joe_: restarting apache on mw1110 mw1167 mw1175, stuck in apc futex
21:54 YuviPanda: depooling mw1184 for re-imaging
21:51 YuviPanda: depooling mw1185 for re-imaging
21:49 YuviPanda: depooling mw1186 for re-imgaging
21:48 YuviPanda: depooling mw1187 for re-imaging
21:07 YuviPanda: re-pooled mw1188
20:43 mutante: added jdouglas to wmf LDAP group
20:14 YuviPanda: repooled mw1209
20:11 logmsgbot: aude Synchronized php-1.25wmf10/extensions/Wikidata: (no message) (duration: 00m 12s)
19:52 mutante: restarted apache on mw1111
19:49 logmsgbot: kaldari Synchronized wmf-config/mobile.php: Deprecating WikiGrok A/B test congif vars (duration: 00m 09s)
19:39 logmsgbot: reedy Synchronized php-1.25wmf9/extensions/SyntaxHighlight_GeSHi/: Fix noise in production (duration: 00m 06s)
19:38 logmsgbot: kaldari Synchronized wmf-config/InitialiseSettings.php: Syncing InitialiseSettings for disabling WikiGrok on en.wiki (A/B test done) (duration: 00m 05s)
19:32 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: disable error logging (duration: 00m 05s)
19:27 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: Enable error log for next 10-15 minutes for the luls (duration: 00m 07s)
19:26 logmsgbot: reedy Synchronized wmf-config/missing.php: CDB updates (duration: 00m 06s)
19:26 logmsgbot: reedy Synchronized multiversion/: CDB updates (duration: 00m 07s)
19:24 logmsgbot: reedy Synchronized wmf-config/: Config updates (duration: 00m 06s)
19:23 logmsgbot: reedy Synchronized search-redirect.php: Fix undefined index spam (duration: 00m 06s)
19:10 logmsgbot: reedy Synchronized wmf-config/: Wikidata config updates (duration: 00m 06s)
19:10 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non wikipedias to 1.25wmf10
18:51 YuviPanda: depooling mw1209 for HHVM re-imaging
18:35 _joe_: repooling mw1076-mw1080
18:08 YuviPanda: repooling mw121[0-9] as HHVM
18:06 logmsgbot: reedy Synchronized wmf-config/: noop for scap test (duration: 00m 06s)
18:06 Reedy: Reverted deployment of scap 6694d147a5b757dfbc747f0732185b014e82e9bb, scap now at b8fb82eb1834e3691287a6e24f8384c6c2259710
17:58 logmsgbot: reedy Synchronized wmf-config/: nooop to test scap (duration: 00m 05s)
17:57 Reedy: Deployed scap @ 6694d147a5b757dfbc747f0732185b014e82e9bb
17:41 K4-713: updated payments to 30f15865bc4efe3b
16:58 _joe_: uploaded hhvm 3.3.0+dfsg1-1+wm5
16:23 _joe_: depooling mw1071-1080
16:02 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: SWAT enable jpg thumbnail chaining on commons (duration: 00m 06s)
15:49 YuviPanda: repooled mw1220, re-imaging to hhvm complete
15:28 cmjohnson: rebooting analytics1033 to verify bios settings
14:42 YuviPanda: depooling mw1220 for HHVM re-imaging
14:36 YuviPanda: depooling mw1220 to re-image as HHVM
11:54 godog: remove legacy symlink /home/wikipedia/syslog from lithium
11:20 _joe_: repooling mw1061,mw1066-mw1070
09:54 _joe_: repooling mw1060,mw1062-65; depooling mw1067-mw1070 for reimaging
08:00 _joe_: depooling mw1060-mw1067 for reimaging
07:37 _joe_: repooling mw1048-1052
05:11 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: Set $wgTidyInternal to false unconditionally to ease deployment of tidy extension (duration: 00m 06s)
04:23 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Dec 2 04:23:23 UTC 2014 (duration 23m 22s)
02:32 logmsgbot: LocalisationUpdate completed (1.25wmf10) at 2014-12-02 02:32:21+00:00
02:32 logmsgbot: l10nupdate Synchronized php-1.25wmf10/cache/l10n: (no message) (duration: 00m 03s)
02:19 logmsgbot: LocalisationUpdate completed (1.25wmf9) at 2014-12-02 02:19:18+00:00
02:19 logmsgbot: l10nupdate Synchronized php-1.25wmf9/cache/l10n: (no message) (duration: 00m 01s)
01:42 awight: updated dash to b3f4be0bbd6c16be64030607fd9c59cb84111429
01:37 K4-713: updated payments to c0c4bfcdb4fa625fa52
00:06 logmsgbot: maxsem Synchronized php-1.25wmf10/extensions/Wikidata/: https://gerrit.wikimedia.org/r/#/c/176837/ (duration: 00m 12s)
00:04 logmsgbot: maxsem Synchronized php-1.25wmf10/extensions/VisualEditor/: https://gerrit.wikimedia.org/r/#/c/176713/ (duration: 00m 06s)

December 1

23:12 mutante: terbium - running rsync in screen to copy wikimania videos to labstore1001
22:51 logmsgbot: aaron Synchronized wmf-config/jobqueue-eqiad.php: b13eaa3f6e287e7268951a2f7e3798f994a20b28; comment tweaks (duration: 00m 05s)
22:38 bblack: rescaled ipvs weights for text/mobile/upload/bits to 1 (there was no differential weighting), for better sh scheduler
22:34 cscott-split: updated OCG to version a06e7c186796a6ee5d5af81e93688520abdf2596
22:33 logmsgbot: awight Synchronized php-1.25wmf10/extensions/FundraiserLandingPage: push FundraiserLandingPage GeoIP fix (duration: 00m 06s)
22:33 logmsgbot: awight Synchronized php-1.25wmf9/extensions/FundraiserLandingPage: push FundraiserLandingPage GeoIP fix (duration: 00m 06s)
22:33 logmsgbot: awight Synchronized php-1.25wmf10/extensions/LandingCheck: push LandingCheck GeoIP fix (duration: 00m 06s)
22:33 logmsgbot: awight Synchronized php-1.25wmf9/extensions/LandingCheck: push LandingCheck GeoIP fix (duration: 00m 06s)
22:32 logmsgbot: awight Synchronized php-1.25wmf10/extensions/DonationInterface: push DonationInterface translations (duration: 00m 06s)
22:32 logmsgbot: awight Synchronized php-1.25wmf9/extensions/DonationInterface: push DonationInterface translations (duration: 00m 07s)
22:16 logmsgbot: awight Synchronized php-1.25wmf10/extensions/FundraiserLandingPage: push FundraiserLandingPage GeoIP fix (duration: 00m 06s)
22:16 logmsgbot: awight Synchronized php-1.25wmf9/extensions/FundraiserLandingPage: push FundraiserLandingPage GeoIP fix (duration: 00m 06s)
22:16 logmsgbot: awight Synchronized php-1.25wmf10/extensions/LandingCheck: push LandingCheck GeoIP fix (duration: 00m 05s)
22:16 logmsgbot: awight Synchronized php-1.25wmf9/extensions/LandingCheck: push LandingCheck GeoIP fix (duration: 00m 06s)
22:15 logmsgbot: awight Synchronized php-1.25wmf10/extensions/DonationInterface: push DonationInterface translations (duration: 00m 09s)
22:15 logmsgbot: awight Synchronized php-1.25wmf9/extensions/DonationInterface: push DonationInterface translations (duration: 00m 07s)
21:06 K4-713: updated payments to 00415dd54bec2d4cf0a
20:08 logmsgbot: yurik Synchronized php-1.25wmf10/extensions/ZeroPortal/: updatidng ZeroPortal to master (duration: 00m 05s)
20:06 logmsgbot: yurik Synchronized php-1.25wmf10/extensions/ZeroBanner/: updatidng ZeroBanner to master (duration: 00m 06s)
20:05 logmsgbot: yurik Synchronized php-1.25wmf9/extensions/ZeroPortal/: updatidng ZeroPortal to master (duration: 00m 05s)
20:04 logmsgbot: yurik Synchronized php-1.25wmf9/extensions/ZeroBanner/: updatidng ZeroBanner to master (duration: 00m 08s)
20:00 logmsgbot: yurik Synchronized mobilelanding.php: https://gerrit.wikimedia.org/r/#/c/175797/ (duration: 00m 06s)
19:43 logmsgbot: maxsem Synchronized php-1.25wmf10/extensions/Popups/: https://gerrit.wikimedia.org/r/#/c/176715/ (duration: 00m 06s)
19:43 logmsgbot: maxsem Synchronized php-1.25wmf9/extensions/Popups/: https://gerrit.wikimedia.org/r/#/c/176715/ (duration: 00m 05s)
19:41 MaxSem: Stashed Tim's uncommitted tidy-related changes on tin
19:19 K4-713: updated DjangoBannerStats to 3db799dc8705c728c
18:25 bblack: ulsfo LVS updated for 'sh' for SSL as well
18:22 bblack: eqiad+esams LVS back to normal, with new config for 'sh' for SSL
18:15 bblack: ditto on pybal 'sh' stuff for esams
18:10 bblack: stopping pybal on primary eqiad LVSes to test 'sh' change for SSL (already restarted for change on backup LVSes)
17:02 andrewbogott: created empty jessie-wikimedia repo on Carbon
16:38 logmsgbot: demon Synchronized wmf-config/CommonSettings.php: opensearchxml conditional include (duration: 00m 06s)
16:37 logmsgbot: anomie Synchronized php-1.25wmf9/extensions/SyntaxHighlight_GeSHi/geshi/geshi.php: SWAT: Fix highly recursive number highlighting regex in GeSHi (duration: 00m 07s)
16:35 logmsgbot: anomie Synchronized php-1.25wmf10/extensions/SyntaxHighlight_GeSHi/geshi/geshi.php: SWAT: Fix highly recursive number highlighting regex in GeSHi (duration: 00m 10s)
16:04 logmsgbot: demon Synchronized wmf-config/abusefilter.php: (no message) (duration: 00m 05s)
16:04 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 05s)
15:49 bd808: restarted logstash on logstash1001; log2udp events were not being processed
15:26 _joe_: depooling mw1047-mw1052
15:24 _joe_: repooling mw1041-mw1046
14:26 _joe_: depooling mw1041-1046
14:16 _joe_: repooling mw1036-mw1040
13:46 _joe_: removing the same files from ocg1002,3 as well
13:44 _joe_: removing cache files from ocg1001, when they're older than 3 days
09:55 _joe_: reimaging mw1033-mw1040 to HHVM, depooling from the main pool now
09:31 _joe_: upgrading hhvm to the latest version across the cluster
04:46 logmsgbot: tstarling Synchronized php-1.25wmf10/includes/parser/MWTidy.php: change previously pulled but scap was apparently not run (duration: 00m 05s)
04:44 logmsgbot: tstarling Synchronized php-1.25wmf9/includes/parser/MWTidy.php: change previously pulled but scap was apparently not run (duration: 00m 06s)
03:34 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Dec 1 03:34:57 UTC 2014 (duration 34m 56s)
02:17 logmsgbot: LocalisationUpdate completed (1.25wmf10) at 2014-12-01 02:17:56+00:00
02:17 logmsgbot: l10nupdate Synchronized php-1.25wmf10/cache/l10n: (no message) (duration: 00m 03s)
02:10 logmsgbot: LocalisationUpdate completed (1.25wmf9) at 2014-12-01 02:10:33+00:00
02:10 logmsgbot: l10nupdate Synchronized php-1.25wmf9/cache/l10n: (no message) (duration: 00m 01s)
00:51 logmsgbot: tstarling Synchronized wmf-config/StartProfiler.php: (no message) (duration: 00m 05s)

November 30

22:51 qchris: Updated EventLogging to 19c23698bc03694017d764af33307d6f035fc224 and restarted it
20:51 qchris: restarted eventlogging mysql-m2-master consumer. It seems it could no longer write to the database.
19:17 Krinkle: Disabling and relauching Gearman connection from Jenkins.
10:28 logmsgbot: oblivian Synchronized wmf-config/jobqueue-eqiad.php: reverting to rdb1001 (duration: 00m 05s)
10:13 mark: Rebooted asw-c4-eqiad
09:13 _joe_: jobsqueues work again
09:08 logmsgbot: oblivian Synchronized wmf-config/jobqueue-eqiad.php: changing the aggregator address as well (duration: 00m 05s)
07:27 _joe_: restarted the jobrunner service on all jobrunners
07:15 logmsgbot: oblivian Synchronized wmf-config/jobqueue-eqiad.php: (no message) (duration: 00m 05s)
05:50 ori: 3:50 UTC: switch asw-c-eqiad lost connectivity with cabinet C4. Impact: phabricator down; gap in web request logs and some perf monitoring. Job queue and Recent Changes stream OK b/c redundant servers are up.
03:41 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Nov 30 03:41:03 UTC 2014 (duration 41m 2s)
02:18 logmsgbot: LocalisationUpdate completed (1.25wmf10) at 2014-11-30 02:18:22+00:00
02:18 logmsgbot: l10nupdate Synchronized php-1.25wmf10/cache/l10n: (no message) (duration: 00m 02s)
02:11 logmsgbot: LocalisationUpdate completed (1.25wmf9) at 2014-11-30 02:11:01+00:00
02:11 logmsgbot: l10nupdate Synchronized php-1.25wmf9/cache/l10n: (no message) (duration: 00m 01s)

November 29

04:17 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Nov 29 04:17:41 UTC 2014 (duration 17m 40s)
02:25 logmsgbot: LocalisationUpdate completed (1.25wmf10) at 2014-11-29 02:25:53+00:00
02:25 logmsgbot: l10nupdate Synchronized php-1.25wmf10/cache/l10n: (no message) (duration: 00m 01s)
02:13 logmsgbot: LocalisationUpdate completed (1.25wmf9) at 2014-11-29 02:13:35+00:00
02:13 logmsgbot: l10nupdate Synchronized php-1.25wmf9/cache/l10n: (no message) (duration: 00m 01s)

November 28

13:19 YuviPanda: restarted apache on palladium, things are recovering
11:20 qchris: Updated gerrit plugin its-phabricator-from-bugzilla to 97c5f02d3ca6259488a763515251c5cc57a11a51
11:20 qchris: Updated gerrit plugin its-phabricator to 9edf90a182e43bfeea7ebbcb20d4a52b6213600d
03:39 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Nov 28 03:39:52 UTC 2014 (duration 39m 51s)
02:22 logmsgbot: LocalisationUpdate completed (1.25wmf10) at 2014-11-28 02:22:28+00:00
02:22 logmsgbot: l10nupdate Synchronized php-1.25wmf10/cache/l10n: (no message) (duration: 00m 04s)
02:10 logmsgbot: LocalisationUpdate completed (1.25wmf9) at 2014-11-28 02:10:06+00:00
02:10 logmsgbot: l10nupdate Synchronized php-1.25wmf9/cache/l10n: (no message) (duration: 00m 01s)

November 27

17:55 _joe_: restarted hhvm on mw1224, the alarm may have been lost in the puppet failure shower earlier
17:24 godog: removed /var/lib/carbon/whisper/archived/jenkins from tungsten
17:09 godog: upload txstatsd 0.7.0~bzr30-0ubuntu0+14 to precise-wikimedia on carbon
16:49 godog: upload missing txstatsd 1.0.0-1 _source package_ to carbon
16:48 godog: upload missing txstatsd 1.0.0-1 to carbon
15:48 logmsgbot: hoo Synchronized php-1.25wmf10/extensions/Wikidata/: Fixing a data model bug + enable Statements on Properties for testwikidata (duration: 00m 12s)
15:34 logmsgbot: hoo Synchronized wmf-config/Wikibase.php: Set "displayStatementsOnProperties" for wikidata/testwikidata (duration: 00m 06s)
14:48 akosiaris: upgrading librsvg throughout the fleet
04:35 springle: restarted squid3 on carbon, but glitches seem to be upstream
04:23 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Nov 27 04:23:45 UTC 2014 (duration 23m 44s)
03:45 springle: puppet failures everywhere; transient apt timeout
02:33 logmsgbot: LocalisationUpdate completed (1.25wmf10) at 2014-11-27 02:33:36+00:00
02:33 logmsgbot: l10nupdate Synchronized php-1.25wmf10/cache/l10n: (no message) (duration: 00m 01s)
02:20 logmsgbot: LocalisationUpdate completed (1.25wmf9) at 2014-11-27 02:20:52+00:00
02:20 logmsgbot: l10nupdate Synchronized php-1.25wmf9/cache/l10n: (no message) (duration: 00m 02s)
00:30 logmsgbot: catrope Synchronized php-1.25wmf10/extensions/VisualEditor: SWAT (duration: 00m 06s)
00:27 ejegg: set TY batch size=400
00:09 awight: disabled thank-you activity records
00:09 logmsgbot: catrope Synchronized php-1.25wmf10/extensions/VisualEditor: SWAT (duration: 00m 07s)
00:09 logmsgbot: catrope Synchronized php-1.25wmf9/extensions/VisualEditor: SWAT (duration: 00m 05s)
00:07 logmsgbot: catrope Synchronized wmf-config/InitialiseSettings.php: Re-enable VisualEditor on frwiktionary and svwiktionary (duration: 00m 06s)

November 26

23:57 ejegg: set TY batch size=700
23:48 YuviPanda: manually ran puppet merge on strontium, puppet merge on palladium didn't sync
23:46 ejegg: enabling TY mail send
23:45 ejegg: set TY batch size=1
23:44 ejegg: updated crm from 96f66e6b6c947c4e4c32c4a4a32dc940dc3b1d60 to 68703898b7ebfb2a038f307f17788739114806e4
23:38 hashar: Jenkins all happy after a restart. Crashing to bed
22:56 hashar: Killing Jenkins, it is deadlocked beyond repair
22:46 hashar: Jenkins still in deadlock, will hard restart Jenkins and Zuul soonish.
22:38 ejegg: disabled TY email sending
22:36 ejegg: enabled TY email sending
22:34 ejegg: enabled CiviMail record creation for TY emails
22:24 cscott: restarted ocg
22:24 logmsgbot: gwicke Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 05s)
22:22 hashar: Jenkins executors are in deadlock ( https://phabricator.wikimedia.org/T72597 )
22:17 hashar: Bah there can only be one mediawiki-core-doxygen-publish job running, with all the merges that happened on mediawiki/core due to the release, there are currently six of them in the queue. They will all be processed eventually
22:14 hashar: mediawiki/core postmerge changes are stuck because mediawiki-core-doxygen-publish refuses to start. Attempted to retrigger them by promoting a change: gallium$ zuul promote --pipeline postmerge --changes 175960,1
22:13 ejegg: updated crm from d0a51250d2bdbf3c818ec0486af284691c7a61ff to 96f66e6b6c947c4e4c32c4a4a32dc940dc3b1d60
22:08 hashar: investigating Zuul/Jenkins. Jenkins potentially has a deadlock
22:02 cscott: updated Parsoid to version 67e2596c
21:52 hashar: Restarting Gearman client. I am in a meeting, will cleanup later.
21:33 bd808: restarted logstash on logstash1001; log2udp events not being received
21:22 ejegg: disabled ty sending
21:15 hashar: Zuul stuck, restarting Gearman client
21:01 bd808: restarted elasticsearch on logstash1002 for OOM
21:00 bd808: restarted elasticsearch on logstash1003 for OOM
20:57 bd808: All three elasticsaerch nodes in the logstash clsuter think logstash1003 is master but ogstash-2014.11.26 is not allocated on any node
20:56 ejegg: enabled queue consumers
20:51 cscott: updated OCG to version 7d8f2b8bd496464041e3ef9c092732457cc8f7ef (did not restart ocg)
20:50 logmsgbot: reedy Synchronized php-1.25wmf10: (no message) (duration: 00m 47s)
20:30 logmsgbot: reedy Purged l10n cache for 1.25wmf6
20:29 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.25wmf10
20:28 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias to 1.25wmf9
20:24 logmsgbot: reedy Finished scap: testwiki to 1.25wmf10 and build l10n cache (duration: 49m 03s)
20:14 ejegg: updated crm from e13cae8c418d29ef444899e0a70bbe03f4b7079d to d0a51250d2bdbf3c818ec0486af284691c7a61ff
20:13 ejegg: disabling queue consumers
19:35 logmsgbot: reedy Started scap: testwiki to 1.25wmf10 and build l10n cache
18:18 logmsgbot: ebernhardson Synchronized wmf-config/InitialiseSettings.php: Whitelist converted lqt pages on officewiki (duration: 00m 07s)
16:54 logmsgbot: gwicke Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 05s)
16:34 Krinkle: Changed Jenkins default language from "en_US" to "en" ("Ignore browser settings" was already enabled). Not sure why, but it's back to English now.
16:16 logmsgbot: marktraceur Synchronized wmf-config/: [SWAT] [config] 174793 Enable VisualEditor as a Beta Feature on most remaining wikis (duration: 00m 06s)
16:13 Krinkle: Jenkins is displaying everything in French (both logged-in/logged-out users alike)
16:11 logmsgbot: marktraceur Synchronized php-1.25wmf9/extensions/Flow/: [SWAT] [wmf9] 175941 "Provide user to local LQT api calls" for officewiki. (duration: 00m 08s)
12:25 godog: stopped ocg on ocg1*
12:18 godog: restarting ocg on ocg1001
12:00 godog: removing pdf files older than 14d from ocg100*
11:57 godog: removing pdf files older than 14d from ocg1001
06:48 logmsgbot: tstarling Synchronized w/oauth-headers.php: (no message) (duration: 00m 05s)
06:43 logmsgbot: tstarling Synchronized w/oauth-headers.php: (no message) (duration: 00m 06s)
06:40 logmsgbot: tstarling Synchronized live-1.5/oauth-headers.php: (no message) (duration: 00m 05s)
06:34 logmsgbot: tstarling Synchronized php-1.25wmf9/extensions/OAuth/lib/OAuth.php: (no message) (duration: 00m 05s)
06:09 logmsgbot: tstarling Synchronized php-1.25wmf9/extensions/OAuth/lib/OAuth.php: (no message) (duration: 00m 06s)
06:07 logmsgbot: tstarling Synchronized php-1.25wmf9/extensions/OAuth/lib/OAuth.php: (no message) (duration: 00m 06s)
05:15 logmsgbot: tstarling Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 06s)
04:24 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Nov 26 04:24:14 UTC 2014 (duration 24m 13s)
02:30 logmsgbot: LocalisationUpdate completed (1.25wmf9) at 2014-11-26 02:30:20+00:00
02:30 logmsgbot: l10nupdate Synchronized php-1.25wmf9/cache/l10n: (no message) (duration: 00m 01s)
02:18 logmsgbot: LocalisationUpdate completed (1.25wmf8) at 2014-11-26 02:18:29+00:00
02:18 logmsgbot: l10nupdate Synchronized php-1.25wmf8/cache/l10n: (no message) (duration: 00m 03s)
01:23 bd808: restarted logstash on logstash1001; no events from log2udp relay being recorded
00:48 logmsgbot: aaron Synchronized wmf-config/StartProfiler.php: Remove obsolete profiling settings (duration: 00m 06s)
00:33 springle: power down db2033 for reassignement to codfw frack
00:04 qchris: restarted eventlogging mysql-m2-master consumer. It seems it could no longer write to the database.

November 25

23:06 Tim: on osmium: removing stale static pcre and zip libraries in /usr/local , installed by hhvm
22:50 logmsgbot: ebernhardson Finished scap: Bump Echo and Flow in 1.25wmf9 for officewiki deployment (duration: 30m 17s)
22:20 logmsgbot: ebernhardson Started scap: Bump Echo and Flow in 1.25wmf9 for officewiki deployment
22:20 logmsgbot: ebernhardson Synchronized php-1.25wmf9/extensions/Echo/: Bump Echo in 1.25wmf9 (duration: 00m 08s)
21:55 logmsgbot: ejegg Synchronized wmf-config/CommonSettings.php: Turn CN client-side banner choice back on everywhere (duration: 00m 05s)
21:39 logmsgbot: ejegg Synchronized php-1.25wmf8/extensions/CentralNotice/: One more CentralNotice fix to get out ahead of the winter rush - wmf8 (duration: 00m 07s)
21:22 logmsgbot: ejegg Synchronized wmf-config/CommonSettings.php: Turn CN client-side banner choice back on for selected wmf9 wikis (duration: 00m 05s)
21:15 logmsgbot: ejegg Synchronized php-1.25wmf9/extensions/CentralNotice/: One more CentralNotice fix to get out ahead of the winter rush (duration: 00m 05s)
20:53 Nemo_bis: 100 % packet loss between esams and r1fra1.core.init7.net
20:16 logmsgbot: reedy Synchronized php-1.25wmf9/extensions/Wikidata: Ic070ce0beb142e100490940fddaa0bd36b8a50be (duration: 00m 14s)
20:09 logmsgbot: reedy Synchronized php-1.25wmf8/extensions/Wikidata: Ensure my sanity (duration: 00m 13s)
19:49 bd808: restarted elasticsearch on logstah1002 after OOM
19:38 logmsgbot: reedy Synchronized wmf-config/: Config updates (duration: 00m 06s)
19:32 Reedy: Created wikilove tables on zhwikivoyage
19:28 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non Wikipedias to 1.25wmf9
19:27 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non Wikipedias to 1.25wmf9
19:18 logmsgbot: reedy Synchronized php-1.25wmf8/extensions/Wikidata: I08946aac3 (duration: 00m 12s)
19:18 logmsgbot: reedy Synchronized php-1.25wmf9/extensions/CentralNotice: Ib4d23f2a588f58ef3abcbd8b0b500ad8534723cd (duration: 00m 06s)
19:17 logmsgbot: reedy Synchronized php-1.25wmf8/extensions/CentralNotice: Ib4d23f2a588f58ef3abcbd8b0b500ad8534723cd (duration: 00m 07s)
18:28 csteipp: deployed patches for T74222 and T72901
17:42 _joe_: repooled mw1019-1032,mw1053 in the appservers pool
17:13 _joe_: depooled mw1019-1032 from the hhvm pool
17:07 logmsgbot: bd808 Synchronized wmf-config/logging-labs.php: Update labs logging config (I1843dfd) (duration: 00m 06s)
17:04 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: $wgPercentHHVM = 0 (duration: 00m 05s)
17:00 logmsgbot: anomie Synchronized php-1.25wmf9/includes/api: SWAT: API: Work around wfMangleFlashPolicy() gerrit:175596 (duration: 00m 06s)
16:53 logmsgbot: anomie Synchronized php-1.25wmf9/includes: SWAT: Make calling wfMangleFlashPolicy configurable gerrit:175598 (duration: 00m 09s)
16:44 logmsgbot: bd808 Synchronized wmf-config/logging-labs.php: Update labs logging config (Ib8d8f8e) (duration: 00m 06s)
16:36 logmsgbot: bd808 Synchronized wmf-config/logging-labs.php: Update labs logging config (Iaab0047) (duration: 00m 06s)
16:24 logmsgbot: ori Synchronized wmf-config/InitialiseSettings.php: Ibd888465: Remove HHVM beta feature (duration: 00m 05s)
16:14 _joe_: pooling mw1237-1258 in the appserver pool
15:03 godog: upload bcache-tools 1.0.7-1 to carbon
12:15 _joe_: pooling mw1221-mw1226 in the API pool
06:37 YuviPanda: restarted apache on strontium, was seeing transient puppetmaster fails
06:08 mutante: in respose to jenkins login issue reported by krinkle: /var/lib/jenkins/xml.config on gallium had "virt1000" value for LDAP, earlier Andrew made a switch from there to ldap-eqiad. fixed config, restarted jenkins
06:06 mutante: restarted gitblit
04:31 jgage: restarted jenkins
04:22 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Nov 25 04:22:50 UTC 2014 (duration 22m 49s)
03:53 Krinkle: Jenkins is unable to create new user sessions. Suspect LDAP is having issues.
03:16 springle: m2 db1020 rebuilt, but blocked from dbproxy1002 until replag=0
03:12 logmsgbot: awight Synchronized wmf-config: Disabling CentralNotice client banner choice due to T75812 (duration: 00m 05s)
02:31 logmsgbot: LocalisationUpdate completed (1.25wmf9) at 2014-11-25 02:31:48+00:00
02:31 logmsgbot: l10nupdate Synchronized php-1.25wmf9/cache/l10n: (no message) (duration: 00m 01s)
02:18 logmsgbot: LocalisationUpdate completed (1.25wmf8) at 2014-11-25 02:18:52+00:00
02:18 logmsgbot: l10nupdate Synchronized php-1.25wmf8/cache/l10n: (no message) (duration: 00m 01s)
02:15 mutante: old-bugzilla now behind varnish too, cert issue should be gone
02:07 bblack: all LVS back to normal runtime state w/ new SSL config
01:56 bblack: switching off pybal on primary LVS in esams for HTTPS check
01:54 bblack: switching off pybal on primary LVS in eqiad for HTTPS check
01:51 bblack: esams+eqiad backup LVS converted to new ssl config (lvs100[45] + lvs300[34])
01:43 logmsgbot: awight Synchronized php-1.25wmf9/extensions/CentralNotice: push CentralNotice updates (duration: 00m 05s)
01:42 logmsgbot: awight Synchronized php-1.25wmf8/extensions/CentralNotice: push CentralNotice updates (duration: 00m 06s)
01:21 bblack: disabling puppet on lvs[13]00[1-6] for SSL-related changes
01:15 K4-713: disabling fredge consumer
01:03 bblack: puppet back to normal on caches
00:59 logmsgbot: bd808 Synchronized wmf-config/logging-labs.php: Update labs logging config (duration: 00m 05s)
00:31 K4-713: updated payments to 3e3cda8f07af9f7f7
00:25 logmsgbot: maxsem Synchronized php-1.25wmf8/extensions/MobileFrontend/: https://gerrit.wikimedia.org/r/175611 (duration: 00m 05s)
00:22 logmsgbot: maxsem Synchronized php-1.25wmf9/extensions/MobileFrontend/: https://gerrit.wikimedia.org/r/175613 (duration: 00m 05s)
00:20 bblack: puppet disabled on prod text/mobile/bits/upload varnishes for careful SSL changes
00:07 logmsgbot: maxsem Synchronized wmf-config/logging-labs.php: https://gerrit.wikimedia.org/r/#/c/175604/ labs only (duration: 00m 05s)

November 24

23:36 ori: gallium: rm -f'd /srv/ssd/jenkins-slave/workspace/mwext-DonationInterface-testextension/src/vendor/.git/HEAD.lock
23:11 logmsgbot: legoktm Synchronized php-1.25wmf8/extensions/NavigationTiming/README.md: Update NavigationTiming https://gerrit.wikimedia.org/r/175585 (duration: 00m 05s)
23:05 logmsgbot: legoktm Synchronized php-1.25wmf9/extensions/NavigationTiming: Update NavigationTiming for https://gerrit.wikimedia.org/r/#/c/175584/ (duration: 00m 05s)
23:01 logmsgbot: legoktm Synchronized README: Updating README https://gerrit.wikimedia.org/r/175579 (duration: 00m 05s)
22:50 bblack: opening up access to labs/private repo in gerrit perms
22:44 logmsgbot: yurik Synchronized mobilelanding.php: https://gerrit.wikimedia.org/r/#/c/175550/ (duration: 00m 05s)
22:13 awight: enabled client banner choice config everywhere
22:13 logmsgbot: awight Synchronized wmf-config: Enable CentralNotice 2.5.0 client banner choice, everywhere (duration: 00m 05s)
22:11 logmsgbot: awight Synchronized php-1.25wmf9/extensions/CentralNotice: push CentralNotice updates (duration: 00m 06s)
22:10 awight: pushing CentralNotice patches
22:10 logmsgbot: awight Synchronized php-1.25wmf8/extensions/CentralNotice: push CentralNotice updates (duration: 00m 06s)
21:32 YuviPanda: restarted gitblit on antimony
21:26 andrewbogott: restarting pdns on virt1000 and labcontrol2001
21:26 andrewbogott: restarting opendj on labcontrol2001 and neptunium
21:23 andrewbogott: stopping opendj service on virt1000
21:22 andrewbogott: disabled ldap replication on virt1000
19:01 ejegg: updated tools from b537e2ec80d16b84f8e0539d4e3d78c8afef1b63 to 113dfe160b750657626e07450003cc88d3939fbd
16:38 andrewbogott: moved virt1000* certs out of /etc/ssl to verify that they are no longer used
16:30 logmsgbot: bd808 Synchronized wmf-config/logging-labs.php: Revert monolog logging config (duration: 00m 05s)
16:24 logmsgbot: manybubbles Synchronized wmf-config/: SWAT update config for stash limit in upload wizard (duration: 00m 06s)
16:22 logmsgbot: manybubbles Synchronized php-1.25wmf9/extensions/BounceHandler/: SWAT update bounce handler to use right db (duration: 00m 06s)
16:15 logmsgbot: manybubbles Synchronized wmf-config/logging-labs.php: SWAT update for labs - should be noop in production (duration: 00m 06s)
14:44 manybubbles: restarting the elasticsearch server didn't cause any hickups. Rolling restart should be totally ok.
14:21 manybubbles: performing test restart of elastic1002 to see what a rolling restart would be like while serving enwiki's searches
12:51 _joe_: restarting mw1230, with hyperthreading enabled
12:37 qchris: Added gerrit plugin its-phabricator-from-bugzilla (f9fd2db7a62119ab9a6d1adfd3110b6e59b7a872)
10:58 godog: moved jenkins.ci under archived.jenkins.ci on tungsten, see T1075
10:42 godog: backfilling old txstatsd metrics from / to statsd/ on tungsten
03:31 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Nov 24 03:31:25 UTC 2014 (duration 31m 24s)
02:17 logmsgbot: LocalisationUpdate completed (1.25wmf9) at 2014-11-24 02:17:02+00:00
02:17 logmsgbot: l10nupdate Synchronized php-1.25wmf9/cache/l10n: (no message) (duration: 00m 01s)
02:10 logmsgbot: LocalisationUpdate completed (1.25wmf8) at 2014-11-24 02:10:21+00:00
02:10 logmsgbot: l10nupdate Synchronized php-1.25wmf8/cache/l10n: (no message) (duration: 00m 01s)

November 23

22:26 ori: depooling mw1234; flapping.
03:35 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Nov 23 03:35:48 UTC 2014 (duration 35m 47s)
02:19 logmsgbot: LocalisationUpdate completed (1.25wmf9) at 2014-11-23 02:19:23+00:00
02:19 logmsgbot: l10nupdate Synchronized php-1.25wmf9/cache/l10n: (no message) (duration: 00m 01s)
02:13 logmsgbot: LocalisationUpdate completed (1.25wmf8) at 2014-11-23 02:12:55+00:00
02:12 logmsgbot: l10nupdate Synchronized php-1.25wmf8/cache/l10n: (no message) (duration: 00m 01s)

November 22

23:02 springle: upgrade db1020 trusty, xtrabackup clone db1046 to db1020
21:21 hashar: Jenkins: disconnected/reconnected gallium slave. All executors were being busy / deadlocked
20:29 springle: db1046 m2-master threadpool lockup, restarted mysqld, investigating
19:39 logmsgbot: ori Synchronized wmf-config/InitialiseSettings.php: Ifae6e0ab6: Clean up indents, comments, spacing in InitialiseSettings (duration: 00m 05s)
04:30 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Nov 22 04:30:53 UTC 2014 (duration 30m 51s)
02:33 logmsgbot: LocalisationUpdate completed (1.25wmf9) at 2014-11-22 02:33:36+00:00
02:33 logmsgbot: l10nupdate Synchronized php-1.25wmf9/cache/l10n: (no message) (duration: 00m 01s)
02:20 logmsgbot: LocalisationUpdate completed (1.25wmf8) at 2014-11-22 02:20:36+00:00
02:20 logmsgbot: l10nupdate Synchronized php-1.25wmf8/cache/l10n: (no message) (duration: 00m 01s)
00:44 hoo: Disabled login for dewiki accounts "W" and "H"

November 21

23:33 K4-713: Updated payments to 374480152a40d1b
23:28 hoo: Disabled login for dewiki account "@"
22:54 hoo: Disabled login for dewiki account "C"
22:17 logmsgbot: ejegg Synchronized php-1.25wmf8/extensions/CentralNotice/: (no message) (duration: 00m 05s)
21:45 logmsgbot: anomie Synchronized php-1.25wmf8/extensions/SecurePoll/: Backport another SecurePoll bug fix (duration: 00m 06s)
21:44 logmsgbot: anomie Synchronized php-1.25wmf9/extensions/SecurePoll/: Backport another SecurePoll bug fix (duration: 00m 06s)
21:32 logmsgbot: anomie Synchronized wmf-config/InitialiseSettings.php: Testing scap, no actual change (duration: 00m 05s)
21:25 logmsgbot: anomie Synchronized wmf-config/InitialiseSettings.php: Testing scap, no actual change (duration: 00m 06s)
21:23 logmsgbot: anomie Synchronized php-1.25wmf8/extensions/SecurePoll/: Backport SecurePoll bug fixes (duration: 00m 05s)
21:22 logmsgbot: anomie Synchronized php-1.25wmf8/extensions/SecurePoll/: Backport SecurePoll bug fixes (duration: 00m 01s)
21:18 logmsgbot: ori Synchronized php-1.25wmf9/extensions/SecurePoll: Backport SecurePoll bug fixes (duration: 00m 06s)
21:14 logmsgbot: anomie Synchronized php-1.25wmf9/extensions/SecurePoll/: Backport SecurePoll bug fixes (duration: 00m 01s)
21:10 logmsgbot: ejegg Synchronized php-1.25wmf9/extensions/CentralNotice/: (no message) (duration: 00m 07s)
20:31 logmsgbot: aaron Synchronized wmf-config/InitialiseSettings.php: Removed duplicated BounceHandler log entry (duration: 00m 05s)
18:16 chasemp: rebooting zirconium because bugzilla
17:26 hoo: Disabled login for dewiki account "K"
17:19 _joe_: apache hard restart on strontium
16:59 _joe_: restarted hhvm on mw1025, TC cache exhausted
16:43 _joe_: pooled mw1228-9
16:33 _joe_: pooling mw1236 (HHVM) into the main apache pool
16:23 _joe_: repooling mw1232-3
16:17 godog: upload carbonate 0.2.2-1 to trusty-wikimedia
15:56 _joe_: repooling mw1231
15:52 _joe_: repooling mw1230
15:45 _joe_: repooling mw1227
14:57 logmsgbot: hoo Synchronized wmf-config/InitialiseSettings.php: Add 'BounceHandler' to wgDebugLogGroups (duration: 00m 05s)
14:47 cmjohnson: mw1230-1233 down --reinstalling
14:34 logmsgbot: hoo Synchronized wmf-config/CommonSettings.php: Also whitelist IPv6 ips for bouncehandler (duration: 00m 08s)
13:48 _joe_: depooled mw1227
13:08 springle: fresh dump db1046 to db2011
11:39 logmsgbot: mark Synchronized wmf-config/InitialiseSettings.php: add openfashion.momu.be to wgCopyUploadsDomains (duration: 00m 06s)
11:05 _joe_: pooled mw1234,mw1235 in the api pool
10:40 _joe_: pooled mw1231,mw1232,mw1233 in the api pool
10:33 _joe_: pooled mw1230 in the api pool
10:29 _joe_: pooled mw1227 in the api pool
04:19 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Nov 21 04:19:30 UTC 2014 (duration 19m 29s)
02:26 logmsgbot: LocalisationUpdate completed (1.25wmf9) at 2014-11-21 02:26:31+00:00
02:26 logmsgbot: l10nupdate Synchronized php-1.25wmf9/cache/l10n: (no message) (duration: 00m 02s)
02:15 logmsgbot: LocalisationUpdate completed (1.25wmf8) at 2014-11-21 02:15:19+00:00
02:15 logmsgbot: l10nupdate Synchronized php-1.25wmf8/cache/l10n: (no message) (duration: 00m 01s)
01:24 logmsgbot: ebernhardson Synchronized php-1.25wmf8/extensions/Flow/includes/Parsoid/: Bump flow submodule in 1.25wmf8 (duration: 00m 04s)
01:19 K4-713: updated payments to 4d6afa865b5e8
01:02 ori: Updated scap to I5782e8cbe: Make the SSH user and authentication socket configurable
00:57 qchris: disabled gerrit's hooks-bugzilla plugin (See T210)
00:46 logmsgbot: catrope Synchronized php-1.25wmf9/resources/lib/oojs-ui/: SWAT (duration: 00m 03s)
00:46 logmsgbot: catrope Synchronized php-1.25wmf9/extensions/Flow: SWAT (duration: 00m 05s)
00:46 logmsgbot: catrope Synchronized php-1.25wmf9/extensions/MultimediaViewer: SWAT (duration: 00m 03s)
00:46 logmsgbot: catrope Synchronized php-1.25wmf9/extensions/VisualEditor: SWAT (duration: 00m 04s)
00:45 logmsgbot: catrope Synchronized php-1.25wmf8/resources/lib/oojs-ui/: SWAT (duration: 00m 03s)
00:45 logmsgbot: catrope Synchronized php-1.25wmf8/extensions/Flow: SWAT (duration: 00m 05s)
00:44 logmsgbot: catrope Synchronized php-1.25wmf8/extensions/MultimediaViewer: SWAT (duration: 00m 04s)
00:44 logmsgbot: catrope Synchronized php-1.25wmf8/extensions/VisualEditor: SWAT (duration: 00m 04s)
00:12 logmsgbot: catrope Synchronized wmf-config/: SWAT (again, forwmgMFCustomLogos) (duration: 00m 05s)
00:07 logmsgbot: catrope Synchronized wmf-config/: SWAT (duration: 00m 08s)

November 20

23:47 logmsgbot: maxsem Synchronized php-1.25wmf8/extensions/WikiGrok/: https://gerrit.wikimedia.org/r/174847 (duration: 00m 04s)
23:46 logmsgbot: maxsem Synchronized php-1.25wmf9/extensions/WikiGrok/: https://gerrit.wikimedia.org/r/174847 (duration: 00m 04s)
23:26 ^d: graceful'd mw1135, apc stale?
23:19 ^d: running sync-common on mw1135. out of sync?
23:13 logmsgbot: maxsem Synchronized php-1.25wmf9/extensions/MobileFrontend/: https://gerrit.wikimedia.org/r/#/c/174749/ (duration: 00m 04s)
22:49 ejegg: rolled back payments-wiki from 1e533d6dfc200e6a84f0a8418a8a1ecddb2b3aed to e3d235f881282120409e1a6ed1a3908ce9a63c26
22:31 logmsgbot: demon Synchronized wmf-config/: globaluserpage on beta, no-op sync (duration: 00m 07s)
22:27 logmsgbot: LocalisationUpdate completed (1.25wmf9) at 2014-11-20 22:27:08+00:00
22:23 logmsgbot: bd808 Synchronized php-1.25wmf9/cache/l10n: (no message) (duration: 08m 05s)
22:03 logmsgbot: LocalisationUpdate completed (1.25wmf8) at 2014-11-20 22:03:06+00:00
21:59 logmsgbot: bd808 Synchronized php-1.25wmf8/cache/l10n: (no message) (duration: 05m 05s)
21:40 bd808|deploy: Testing l10nupdate changes
21:38 ejegg: updated crm from ed3f3f8e31119eb7d52d5730ece4e22ac1dd055a to e13cae8c418d29ef444899e0a70bbe03f4b7079d
21:33 ejegg: updated payments-wiki e3d235f881282120409e1a6ed1a3908ce9a63c26 to 1e533d6dfc200e6a84f0a8418a8a1ecddb2b3aed
21:28 logmsgbot: aude Synchronized php-1.25wmf8/extensions/Wikidata: Update Wikidata - property suggester (duration: 00m 10s)
21:26 logmsgbot: aude Synchronized php-1.25wmf9/extensions/Wikidata: Update test.wikidata - property suggester (duration: 00m 10s)
21:00 ori: Updated EventLogging to 39de1d3faacc8463db7532405e8fc003b80ecb79
20:54 ejegg: updated crm from ff89895638a0dd0600b2e4c0b6adfd1b8e402df5 to ed3f3f8e31119eb7d52d5730ece4e22ac1dd055a
20:10 ejegg: updated tools from fe9b463379fac35ad5e71a57fbbb95ae39e2356e to b537e2ec80d16b84f8e0539d4e3d78c8afef1b63
20:09 yuvipanda: run chown l10nupdate:wikidev /var/lock/scap on tin, for https://gerrit.wikimedia.org/r/#/c/174784/1
19:42 logmsgbot: demon Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 04s)
19:34 jgage: restarted puppetmasters
19:30 logmsgbot: demon Finished scap: (no message) (duration: 23m 37s)
19:06 logmsgbot: demon Started scap: (no message)
18:51 logmsgbot: LocalisationUpdate completed (1.25wmf9) at 2014-11-20 18:51:39+00:00
18:39 logmsgbot: LocalisationUpdate completed (1.25wmf8) at 2014-11-20 18:39:00+00:00
18:27 andrewbogott: updated the python-openstack-wikistatus on carbon to 2014.11
18:22 logmsgbot: demon Synchronized php-1.25wmf9/extensions/ExtensionDistributor/: (no message) (duration: 00m 07s)
17:01 godog: reboot ms-be1007, xfs-induced high load
16:59 _joe_: restart apache on mw1218, stuck in a apc futex
16:38 logmsgbot: demon Synchronized php-1.25wmf9/extensions/Math: (no message) (duration: 00m 06s)
16:26 _joe_: puppet reenabled everywhere, change tested and live on all varnishes within the next 20 minutes
16:15 logmsgbot: demon Synchronized php-1.25wmf9/extensions/CirrusSearch: (no message) (duration: 00m 04s)
16:15 logmsgbot: demon Synchronized php-1.25wmf8/extensions/CirrusSearch: (no message) (duration: 00m 05s)
15:52 _joe_: disabling puppet on all caches, before a pretty large change, will be reeanbled after a few tests
15:01 hashar: Restarting Jenkins AND Zuul. Beta cluster jobs are still deadlocked.
14:00 godog: restart txstatsd on tungsten to stop receiving jenkins metrics
13:04 hashar: Jenkins: restarting to remove a deadlock and unload the Statsd plugin
12:30 godog: upload carbon-c-relay to trusty-wikimedia
12:12 qchris: Restarted EventLogging mysql-m2 consumer to empty its caches
05:41 bblack: amssq31-62, cp300[12], lvs300[34], ssl300[123] all shut down for esams power event (and downtimed)
05:28 bblack: ~30m to esams power out, starting equipment shutdown and such for OE13/OE15
05:15 Tim: made myself an administrator on phabricator
04:27 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Nov 20 04:27:25 UTC 2014 (duration 27m 24s)
02:34 logmsgbot: LocalisationUpdate completed (1.25wmf9) at 2014-11-20 02:34:12+00:00
02:21 logmsgbot: LocalisationUpdate completed (1.25wmf8) at 2014-11-20 02:21:35+00:00
00:50 logmsgbot: maxsem Synchronized php-1.25wmf9/extensions/MobileFrontend/: https://gerrit.wikimedia.org/r/#/c/174613/ (duration: 00m 04s)
00:41 logmsgbot: catrope Synchronized wmf-config/InitialiseSettings.php: Change mobile wordmark image to relative URL (duration: 00m 04s)
00:34 logmsgbot: catrope Synchronized php-1.25wmf9/extensions/VisualEditor: SWAT (duration: 00m 04s)
00:34 logmsgbot: catrope Synchronized php-1.25wmf8/extensions/VisualEditor: SWAT (duration: 00m 04s)
00:26 logmsgbot: catrope Synchronized php-1.25wmf8/includes/media/: SWAT: don't apply EXIF rotation to chained thumbnails (duration: 00m 04s)
00:13 logmsgbot: catrope Synchronized wmf-config/: SWAT: temp debugging for SecurePoll (duration: 00m 04s)
00:10 logmsgbot: catrope Synchronized wmf-config/: SWAT (duration: 00m 04s)
00:09 logmsgbot: catrope Synchronized images/mobile/: SWAT: new Wikipedia wordmark for mobile (duration: 00m 03s)

November 19

22:01 ottomata: starting trusty upgrade of analytics1041
21:38 yuvipandajs: restarted txstatsd & carbon on labmon1001, recovering from missing points now
21:35 logmsgbot: reedy Synchronized wmf-config/: ContactPage for legal (duration: 00m 17s)
21:34 logmsgbot: reedy Synchronized docroot and w: (no message) (duration: 00m 15s)
21:25 ottomata: starting trusty upgrade of analytics1040
21:22 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.25wmf9
21:21 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikipedias to 1.25wmf8
21:18 logmsgbot: reedy Finished scap: testwiki to 1.25wmf9 and build l10n cache (duration: 105m 06s)
20:57 ottomata: starting trusty upgrade of analytics1039
20:23 ottomata: starting trusty upgrade of analytics1038
19:49 bblack: starting the long slow process of draining out esams traffic ahead of power maint event
19:34 ottomata: starting trusty upgrade of analytics1037
19:33 logmsgbot: reedy Started scap: testwiki to 1.25wmf9 and build l10n cache
19:30 andrewbogott: upgrading other compute hosts: virt1001-1009
19:25 ori: disarming keyholder agent on tin to test alerts
18:53 logmsgbot: reedy Synchronized php-1.25wmf8/extensions/Wikidata: Ie105a80aa776769eb0dae8a44cda0b7dbe018fb5 (duration: 00m 22s)
18:50 andrewbogott: upgrading virt1006
18:36 andrewbogott: upgrading labnet1001
18:33 andrewbogott: upgraded glance on virt1000 to version icehouse
18:29 ottomata: starting trusty upgrade of 1036
18:18 logmsgbot: reedy Synchronized wmf-config/: BounceHandler (duration: 00m 15s)
18:17 logmsgbot: reedy Synchronized php-1.25wmf8/extensions/BounceHandler/: Bump (duration: 00m 14s)
17:57 andrewbogott: disabled keystone-redis because the current package doesn't work with icehouse
16:36 andrewbogott: upgrading virt1000
16:16 andrewbogott: moved virt1000 db backup to /a/osback because it was /way/ too big to fit in my homedir
16:12 hashar: Jenkins: uninstalled Jenkins statsd plugin ( https://phabricator.wikimedia.org/T1278 ). It is overloading the statsd server with a bunch of metrics we don't care about ( https://phabricator.wikimedia.org/T1075 )
15:57 andrewbogott: backed up all labs openstack databases to virt1000:~andrew/osback/havana-db-backup.sql
15:49 andrewbogott: backing up labs configs in ~andrew/osback/<servicename>
15:48 andrewbogott: beginning upgrade of labs OpenStack from Havana to Icehouse
09:44 logmsgbot: hoo Synchronized php-1.25wmf8/extensions/Wikidata/: Fix url and commonsMedia UI editing (duration: 00m 42s)
05:44 legoktm: batchCAAntiSpoof finished with "34721605 user(s) done."
04:57 ejegg: updated tools from 419fb7aa32c6d0776056968378e358ee01985565 to fe9b463379fac35ad5e71a57fbbb95ae39e2356e
04:42 Tim: on mw1114: testing xhprof hack for T758
04:15 ori: mw1020: disabled puppet & restarted hhvm w/hhvm.eval.perf_pid_map = true to test
04:15 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Nov 19 04:15:00 UTC 2014 (duration 14m 59s)
02:26 logmsgbot: LocalisationUpdate completed (1.25wmf8) at 2014-11-19 02:26:12+00:00
02:14 logmsgbot: LocalisationUpdate completed (1.25wmf7) at 2014-11-19 02:14:12+00:00
01:28 qchris: Restarted EventLogging mysql-m2 consumer to pick up switch to dbproxy1002
01:23 logmsgbot: ori Synchronized php-1.25wmf7/extensions/SyntaxHighlight_GeSHi: I788e1beb8: Update SyntaxHighlight_GeSHi for cherry-picks (duration: 00m 05s)
01:20 logmsgbot: ori Synchronized php-1.25wmf8/extensions/SyntaxHighlight_GeSHi: Ibb0f7c24: Update SyntaxHighlight_GeSHi for cherry-picks (duration: 00m 05s)
00:48 springle: m2-master CNAME switch to dbproxy1002, and db1046 to primary backend
00:38 ori: EventLogging: deployed 423f7dd5b2b5 & restarted.
00:21 logmsgbot: maxsem Synchronized wmf-config/mobile.php: https://gerrit.wikimedia.org/r/174303 (duration: 00m 05s)
00:09 MaxSem: gracefulled apache on mw1205 (suspect an APC bug)
00:06 ori: repooled mw1205

November 18

23:34 logmsgbot: ori Synchronized wmf-config: I76f2023a1: 'Undeploy AntiBot' (duration: 00m 04s)
23:19 logmsgbot: demon Synchronized wmf-config: undeploy antibot (duration: 00m 04s)
22:46 bd808: Updated zuul config on gallium to include I511b14e (Make cdb-phpunit job non-voting)
22:27 ^d: elasticsearch: set a template to apply auto_expand_replicas 0-2 on all newly created indexes.
22:00 logmsgbot: hoo Synchronized wmf-config/Wikibase.php: Bump cache epoch (duration: 00m 07s)
21:50 logmsgbot: hoo Synchronized php-1.25wmf8/extensions/Wikidata/: Fix EntityIdLabelFormatter et al. (duration: 00m 17s)
21:33 ^d: elasticsearch: set auto_expand_replicas to 0-2 on ttmserver(-test) like other indexes for extra redundancy.
20:25 ori: re-enabling puppet on all varnishes following deployment of Iac35f2329
20:10 legoktm: running batchCAAntiSpoof.php on terbium
20:02 legoktm: ran populateGlobalRenameLogSearch.php on metawiki
19:47 ori: disabling Puppet on varnishes to push out Iac35f2329
19:40 logmsgbot: reedy Synchronized wmf-config/Wikibase.php: bump epoch (duration: 00m 13s)
19:40 ottomata: starting trusty upgrade of analytics1032
19:28 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 14s)
19:23 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non wikipedias to 1.25wmf8
19:14 ottomata: starting trusty upgrade of analytics1031
19:05 ori: depooled mw1205; out of sync
19:03 logmsgbot: maxsem Synchronized php-1.25wmf7/extensions/WikiGrok/: Retry sync (duration: 00m 07s)
18:57 logmsgbot: maxsem Synchronized php-1.25wmf7/extensions/WikiGrok/: Revert (duration: 00m 04s)
18:54 logmsgbot: maxsem Synchronized php-1.25wmf7/extensions/WikiGrok/: SQL backed version (duration: 00m 04s)
18:49 logmsgbot: maxsem Synchronized php-1.25wmf8/extensions/WikiGrok/: SQL backed version (duration: 00m 05s)
18:47 ejegg: updated crm from 71ec68e8da1de289c4e7adca090c0fdbccbd8b8a to ff89895638a0dd0600b2e4c0b6adfd1b8e402df5
18:47 ejegg: updated crm
18:42 ottomata: starting trusty upgrade of analtyics1030
18:42 logmsgbot: demon Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 04s)
18:34 ejegg: update crm from e9e81a828d50e8bddf98eae699c925e09b25927b to 71ec68e8da1de289c4e7adca090c0fdbccbd8b8a
18:31 MaxSem: created wikigrok_questions table on test, test2 and enwiki
18:25 logmsgbot: maxsem Synchronized php-1.25wmf7/extensions/MobileFrontend/: https://gerrit.wikimedia.org/r/#/c/173316/ (duration: 00m 04s)
18:25 logmsgbot: maxsem Synchronized php-1.25wmf8/extensions/MobileFrontend/: https://gerrit.wikimedia.org/r/#/c/173316/ (duration: 00m 08s)
18:08 ottomata: starting trusty upgrade of analytics1029
18:04 logmsgbot: aaron Synchronized wmf-config/StartProfiler.php: Added switch-logic for new Profiler config format (duration: 00m 05s)
17:40 logmsgbot: demon Synchronized php-1.25wmf8/extensions/Translate/ttmserver/ElasticSearchTTMServer.php: hack (duration: 00m 04s)
17:40 logmsgbot: demon Synchronized php-1.25wmf7/extensions/Translate/ttmserver/ElasticSearchTTMServer.php: hack (duration: 00m 04s)
17:34 _joe_: restarting icinga
17:24 logmsgbot: demon Synchronized php-1.25wmf7/extensions/Translate/scripts/ttmserver-export.php: profiling hack (duration: 00m 04s)
17:22 ottomata: starting trusty upgrade of analytics1020
17:21 logmsgbot: demon Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 05s)
17:15 logmsgbot: demon Synchronized php-1.25wmf8/extensions/Translate/scripts/ttmserver-export.php: profiling hack (duration: 00m 06s)
17:07 logmsgbot: demon Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 04s)
17:06 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 04s)
16:35 logmsgbot: anomie Synchronized php-1.25wmf8/extensions/SecurePoll: SWAT: Fix SecurePollContent handling gerrit:174125 (duration: 00m 09s)
16:32 logmsgbot: anomie Synchronized php-1.25wmf8/extensions/MultimediaViewer: SWAT: Media Viewer UI bugfixes gerrit:174116 (for real this time) (duration: 00m 09s)
16:30 godog: restarting txstatsd on tungsten to drop old metrics
16:29 logmsgbot: anomie Synchronized php-1.25wmf8/extensions/MultimediaViewer: SWAT: Media Viewer UI bugfixes gerrit:174116 (duration: 00m 09s)
16:28 ottomata: starting upgrade to trusty of analytics1017
16:27 logmsgbot: anomie Synchronized php-1.25wmf8/includes/filebackend: SWAT: Log more details about backend-fail-internal errors gerrit:174128 (duration: 00m 09s)
16:18 bblack: rubidium+eeden gdnsd upgraded to 2.1.0 (baham was already there)
16:06 manybubbles: replaying 20,000 searches at approximately the same speed that they were issued caused only marginal bounce in load (cluster load average was 13% and two machines went about 20%). We're ready from a performance standpoint. yay
16:02 logmsgbot: anomie Synchronized wmf-config/InitialiseSettings.php: Touch a random PHP file, supposedly required (duration: 00m 09s)
16:02 manybubbles: replaying some searches against cirrus to make *super* *duper* sure it won't fall over tomorrow when we enable enwiki
16:01 logmsgbot: anomie Synchronized visualeditor-default.dblist: SWAT: Enable VisualEditor by default on Catalan Wikiquote (cawikiquote) gerrit:174036 (duration: 00m 09s)
16:01 logmsgbot: anomie Synchronized visualeditor.dblist: SWAT: Enable VisualEditor by default on Catalan Wikiquote (cawikiquote) gerrit:174036 (duration: 00m 09s)
15:43 ottomata: starting trusty upgrade of analytics1016
15:32 hashar: Deleting job https://integration.wikimedia.org/ci/job/mediawiki-vendor-integration/ replaced by mediawiki-phpunit. Clearing out workspaces bug 73515
14:58 ottomata: starting upgrade to Trusty of analytics1015
14:55 springle: fail over m2 to m2-slave (db1046); investigating db1020
14:44 hashar: Gerrit web interface dead with: Cannot open ReviewDb bug 73555
04:16 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Nov 18 04:16:03 UTC 2014 (duration 16m 2s)
02:26 logmsgbot: LocalisationUpdate completed (1.25wmf8) at 2014-11-18 02:26:17+00:00
02:22 ^d: jenkins locale set from 'en' to 'en_US' since 'en' means Italian somehow.
02:14 logmsgbot: LocalisationUpdate completed (1.25wmf7) at 2014-11-18 02:14:18+00:00
01:27 Tim: on osmium: removing ori's custom kernel and rebooting
01:23 springle: temporarily reassign db1004 for phab migration tests
00:55 logmsgbot: maxsem Synchronized php-1.25wmf8/resources/lib/oojs-ui/: https://gerrit.wikimedia.org/r/#/c/174029/ (duration: 00m 04s)
00:53 bd808: Restarted logstash on logstash1002; lots of errors in the log about GELF input >128 chunks
00:50 bd808: Restarted hung logstash process on logstash1001
00:47 logmsgbot: maxsem Synchronized wmf-config: https://gerrit.wikimedia.org/r/173878 (duration: 00m 03s)
00:44 logmsgbot: maxsem Synchronized visualeditor.dblist: https://gerrit.wikimedia.org/r/172996 (duration: 00m 03s)
00:43 logmsgbot: maxsem Synchronized visualeditor-default.dblist: https://gerrit.wikimedia.org/r/172993 (duration: 00m 04s)
00:34 logmsgbot: maxsem Synchronized php-1.25wmf7/extensions/VisualEditor/: SWAT (duration: 00m 04s)
00:33 logmsgbot: maxsem Synchronized php-1.25wmf7/extensions/MobileFrontend/: SWAT (duration: 00m 04s)
00:33 logmsgbot: maxsem Synchronized php-1.25wmf7/extensions/WikiGrok: SWAT (duration: 00m 03s)
00:32 logmsgbot: maxsem Synchronized php-1.25wmf8/extensions/VisualEditor/: SWAT (duration: 00m 04s)
00:24 logmsgbot: maxsem Synchronized php-1.25wmf8/extensions/MobileFrontend/: SWAT (duration: 00m 04s)
00:23 logmsgbot: maxsem Synchronized php-1.25wmf8/extensions/WikiGrok/: (no message) (duration: 00m 04s)
00:21 logmsgbot: maxsem Synchronized php-1.25wmf8/extensions/VisualEditor/: SWAT (duration: 00m 04s)
00:21 logmsgbot: maxsem Synchronized php-1.25wmf8/extensions/Flow: SWAT (duration: 00m 05s)

November 17

23:59 logmsgbot: awight Synchronized wmf-config: Enable new CentralNotice features on mediawikiwiki (duration: 00m 04s)
23:35 logmsgbot: awight Synchronized php-1.25wmf8/extensions/CentralNotice: push CentralNotice updates (duration: 00m 05s)
22:50 cscott: updated Parsoid to version 819b2cf4
22:03 logmsgbot: awight Synchronized wmf-config: Enable new CentralNotice features on beta.wmflabs (duration: 00m 07s)
20:48 logmsgbot: ejegg Synchronized wmf-config: (no message) (duration: 00m 03s)
20:39 logmsgbot: ejegg Synchronized php-1.25wmf8/extensions/CentralNotice/: Update CentralNotice for client-side banner choice (duration: 00m 03s)
20:02 ottomata: starting upgrade of analytics1014 to trusty
17:57 ottomata: starting upgrade to trusty of analytics1013 (having trouble scheduling downtime in icinga right now)
16:38 akosiaris: upload etherpad-lite_1.4.1-1 on apt.wikimedia.org
16:38 logmsgbot: demon Synchronized php-1.25wmf8/extensions/SecurePoll/: (no message) (duration: 00m 05s)
16:38 logmsgbot: demon Synchronized php-1.25wmf7/extensions/SecurePoll/: (no message) (duration: 00m 05s)
16:11 logmsgbot: demon Synchronized php-1.25wmf8/extensions/CirrusSearch: (no message) (duration: 00m 04s)
16:11 logmsgbot: demon Synchronized php-1.25wmf7/extensions/CirrusSearch: (no message) (duration: 00m 05s)
16:09 hashar: Renamed job mediawiki-vendor-integration to mediawiki-phpunit bug 72787
16:03 logmsgbot: demon Synchronized wmf-config/CirrusSearch-common.php: more jobs (duration: 00m 04s)
14:31 hashar: Jenkins/Zuul: disconnected/reconnected Jenkins Gearman client
13:38 apergos: ran puppetstoredconfigclean.rb on db1017, it must have been missed in the rename
12:17 akosiaris: final reboot for xenon, cerium, praseodymium after a dist-upgrade -y
11:27 logmsgbot: ori Synchronized php-1.25wmf8/includes/Import.php: Icc19961fd: 'Debugging statements to try to diagnose bug 40009' (duration: 00m 08s)
11:22 logmsgbot: ori Synchronized wmf-config/InitialiseSettings.php: Ied0a7ab4b: Route Bug40009 logs to fluorine (duration: 00m 07s)
09:51 _joe_: restarting mw1187, all apache children stuck in apc_pthreadmutex_lock()
09:10 akosiaris: praseodymium reimaging
08:53 logmsgbot: oblivian Synchronized wmf-config/CommonSettings.php: Open HHVM to 25% of anons (duration: 00m 06s)
08:49 hashar: if there is any oddity with Jenkins/Zuul please poke me. I am on IRC all day today
08:39 hashar: Jenkins upgraded
08:30 akosiaris: reimaging cerium
08:21 hashar: Upgrading Jenkins
04:15 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Nov 17 04:15:03 UTC 2014 (duration 15m 2s)
02:27 logmsgbot: LocalisationUpdate completed (1.25wmf8) at 2014-11-17 02:27:25+00:00
02:15 logmsgbot: LocalisationUpdate completed (1.25wmf7) at 2014-11-17 02:15:14+00:00

November 16

14:23 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: Disable error Loggroup (duration: 00m 18s)
14:20 logmsgbot: reedy Synchronized wmf-config/CommonSettings.php: Fix Undefined index: HTTPS (duration: 00m 17s)
14:15 logmsgbot: krinkle Synchronized php-1.25wmf8/includes: I8764cf5df87b (duration: 00m 10s)
14:13 logmsgbot: reedy Synchronized rpc/RunJobs.php: (no message) (duration: 00m 16s)
14:10 logmsgbot: krinkle Synchronized php-1.25wmf7/includes: I8764cf5df87b226 (duration: 00m 10s)
13:53 logmsgbot: krinkle Synchronized wmf-config/InitialiseSettings.php: If9194b73c3256e0064ff (duration: 00m 07s)
04:09 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Nov 16 04:09:17 UTC 2014 (duration 9m 16s)
02:22 logmsgbot: LocalisationUpdate completed (1.25wmf8) at 2014-11-16 02:22:49+00:00
02:16 logmsgbot: LocalisationUpdate completed (1.25wmf7) at 2014-11-16 02:16:32+00:00

November 15

21:11 logmsgbot: reedy Synchronized database lists: update size dblists (duration: 00m 17s)
16:49 logmsgbot: reedy Synchronized docroot and w: fix typo (duration: 00m 15s)
16:48 logmsgbot: reedy Synchronized docroot and w: dbtree (duration: 00m 14s)
16:38 logmsgbot: reedy Synchronized docroot and w: dbtree (duration: 00m 14s)
14:26 YuviPanda: made reedy 'full' user on webmaster tools
12:23 logmsgbot: reedy Synchronized wmf-config/missing.php: hhvm support (duration: 00m 14s)
12:20 logmsgbot: reedy Synchronized multiversion/: CDB updates (duration: 00m 14s)
11:32 logmsgbot: reedy Synchronized wmf-config/missing.php: Fix php short tags (duration: 00m 16s)
11:31 logmsgbot: reedy Synchronized php-1.25wmf7/extensions/CommonsMetadata/: Fix warnings (duration: 00m 18s)
11:31 logmsgbot: reedy Synchronized php-1.25wmf7/extensions/cldr/: Fix warnings (duration: 00m 15s)
03:28 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Nov 15 03:28:47 UTC 2014 (duration 28m 46s)
02:16 logmsgbot: LocalisationUpdate completed (1.25wmf8) at 2014-11-15 02:16:31+00:00
02:10 logmsgbot: LocalisationUpdate completed (1.25wmf7) at 2014-11-15 02:10:32+00:00

November 14

23:21 logmsgbot: kaldari Synchronized wmf-config/mobile.php: Updating WikiGrok A/B test start/end times (duration: 00m 07s)
22:41 logmsgbot: ori Synchronized wmf-config/InitialiseSettings.php: If60e3fe97: Deploy Translate extension on ca.wikimedia (duration: 00m 05s)
21:47 bd808: restarted /etc/init.d/ganglia-monitor on logstash1003
18:44 legoktm: running scripts to fix bug 72927
18:01 akosiaris: reimaging xenon
16:40 bd808: Increased replica count from 0 to 2 for all logstash elasticsearch indices. Expect icinga warnings as replicas are populated.
15:56 ottomata: upgrading analytics1024 to trusty
15:46 ottomata: analytics1003 (a cisco) is acting crazy, stuck in some loop while trying to boot. Am attempting to fix with power cycle
14:35 paravoid: cr1-ulsfo: setting up BGP with new transit provider
09:16 hashar: Zuul is back
09:14 hashar: Zuul is flapping
07:38 jgage: logstash hosts: elasticsearch moved to bigger disks
04:36 Tim: on mw1114 restarting hhvm
04:17 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Nov 14 04:17:16 UTC 2014 (duration 17m 15s)
04:03 jgage: logstash1002 migration to new md0 complete
02:30 logmsgbot: LocalisationUpdate completed (1.25wmf8) at 2014-11-14 02:30:34+00:00
02:28 Tim: progressively increasing load on mw1114, attempting to reproduce the previous overload
02:19 jgage: logstash1003 elasticsearch migration to new raid0 complete
02:18 logmsgbot: LocalisationUpdate completed (1.25wmf7) at 2014-11-14 02:18:01+00:00
01:39 logmsgbot: kaldari Synchronized wmf-config/mobile.php: updating WikiGrok A/B test times (duration: 00m 03s)
01:04 logmsgbot: kaldari Synchronized php-1.25wmf7/extensions/MobileFrontend: (no message) (duration: 00m 05s)
01:04 logmsgbot: kaldari Synchronized php-1.25wmf7/extensions/WikiGrok: (no message) (duration: 00m 03s)
00:49 jgage: logstash1003: migrating elasticsearch data to new raid volume
00:42 logmsgbot: kaldari Synchronized wmf-config/mobile.php: Update WikiGrok A/B test times (duration: 00m 03s)
00:20 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 06s)
00:15 mutante: nickel - shutdown
00:15 logmsgbot: demon Synchronized php-1.25wmf8/extensions/VisualEditor: (no message) (duration: 00m 04s)
00:14 logmsgbot: demon Synchronized php-1.25wmf7/extensions/VisualEditor: (no message) (duration: 00m 05s)
00:08 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 04s)
00:06 logmsgbot: demon Synchronized wmf-config/: (no message) (duration: 00m 07s)
00:04 logmsgbot: demon Synchronized php-1.25wmf8/extensions/Echo: (no message) (duration: 00m 04s)
00:04 logmsgbot: demon Synchronized php-1.25wmf7/extensions/Echo: (no message) (duration: 00m 06s)

November 13

23:55 mutante: nickel - remove from puppet,salt,icinga,stop services...
23:52 ^d: restarted gitblit on antimony
23:39 logmsgbot: kaldari Synchronized wmf-config/mobile.php: Adding WikiGrok A/B test start and end times (duration: 00m 03s)
22:19 jgage: hadoop: analytics1010 is again active namenode
22:14 logmsgbot: awight Synchronized php-1.25wmf8/extensions/CentralNotice: push CentralNotice updates (duration: 00m 04s)
22:13 logmsgbot: awight Synchronized php-1.25wmf7/extensions/CentralNotice: push CentralNotice updates (duration: 00m 05s)
22:12 qchris: restarted EventLogging jobs that write to disk, to pick up config changes
22:03 jgage: failed over hadoop namenode to analytics1004
21:42 logmsgbot: awight Synchronized wmf-config: Enabling CentralNotice banner choice on testwiki, take 2 (duration: 00m 06s)
21:15 cscott: updated Parsoid to version dabff010
20:51 cmjohnson: powering down logstash1001 to add disks
20:39 cmjohnson: powering down logstash1002 to add disks
20:37 awight: CentralNotice noops deployed to all wikis
20:36 logmsgbot: awight Synchronized php-1.25wmf7/extensions/CentralNotice: push CentralNotice updates (duration: 00m 05s)
20:33 logmsgbot: awight Synchronized wmf-config: Enabling CentralNotice banner choice on testwiki (duration: 00m 04s)
20:32 bd808: Dropped replica count of all logstash indices except today to 0. Should make rolling restarts faster during hardware upgrade.
20:25 logmsgbot: awight Synchronized php-1.25wmf8/extensions/CentralNotice: push CentralNotice updates (duration: 00m 05s)
20:19 csteipp: patched bugs 71111 and 71394 in wmf7 and wmf8
20:14 cmjohnson: powering down logstash1003 for a few mins to add disks
19:52 ottomata: starting upgrade to trusty on analytics1023
19:15 awight: campaigns reenabled
18:55 awight: disabling CentralNotice campaigns
17:49 ottomata: preparing for trusty upgrade of analytics1003
16:57 bd808: dropped replica count to 0 for logstash indices from 2014-10-30 and 2014-10-31.
16:49 bd808: restarted elasticsearch on logstash1002
16:46 bd808: dropped replica count to 0 for logstash indices from 2014-10-14 through 2014-10-29. See https://phabricator.wikimedia.org/P73 for the commands.
16:45 ottomata: preparing to upgrade analytics1026 to trusty
16:21 bd808: disk utilization is 94% on logstash1002, 92% on logstash1001 and 91% on logstash1003. Too much data in indices even with replica count bumped down to 1 for the small disks we have today.
16:16 bd808: logstash elasticsearch cluster is pretty messed up. logstash1002 has lost shards for all indices except for today, and it's master for that one.
16:16 logmsgbot: manybubbles Synchronized php-1.25wmf8/extensions/CirrusSearch/: SWAT update cirrussearch to fix slow prefix queries (duration: 00m 05s)
16:14 logmsgbot: manybubbles Synchronized wmf-config/CirrusSearch-production.php: SWAT reenable regex search now that it will not crash elasticsearch (duration: 00m 04s)
16:13 logmsgbot: manybubbles Synchronized wmf-config/CirrusSearch-common.php: SWAT reenable accelerated regex search (regex search still disabled) (duration: 00m 03s)
16:11 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: SWAT force summary when running checkuser query on all wikis (duration: 00m 04s)
16:01 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: SWAT revert JPG thumbnail chaining on all wikis except commons (duration: 00m 05s)
15:27 logmsgbot: hashar: deleted all content from https://doc.wikimedia.org/ :-( Will regenerate.
15:09 godog: rolling restart of object-auditor in swift codfw/eqiad to pick up changes
15:06 logmsgbot: yurik Synchronized php-1.25wmf8/extensions/ZeroPortal: updatidng ZeroPortal to master (duration: 01m 13s)
15:04 chasemp: phabricator upgrades T1203
14:43 logmsgbot: hashar: restarted zuul-merger on gallium
14:42 logmsgbot: hashar: restarting Jenkins and Zuul
12:45 godog: investigating high iops on swift eqiad with paravoid, stopped object-auditor on ms-be1005 and ms-be1015
11:09 hashar: resurrected morebots in #wikimedia-operations (see Morebots).
11:08 hashar: Killed Jenkins due to a deadlock
11:08 hashar: Killing Jenkins due to a deadlock
02:52 mutante: beta puppet freshness - UNKNOWN: No valid datapoints found .. since 13d
02:30 logmsgbot: LocalisationUpdate completed (1.25wmf8) at 2014-11-13 02:30:00+00:00
02:18 logmsgbot: LocalisationUpdate completed (1.25wmf7) at 2014-11-13 02:18:44+00:00
00:46 mutante: thulium - Could not intern from pson: expected value in object at '"[PHP]\n\n; puppet:t'!

November 12

21:59 logmsgbot: reedy Synchronized wmf-config/: Set useLegacyUsageIndex = true for Wikibase client (duration: 00m 17s)
20:57 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.25wmf8
20:55 hashar: Restarting Jenkins, deadlock on deployment-bastion
20:28 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 15s)
20:28 logmsgbot: reedy Synchronized database lists: (no message) (duration: 00m 15s)
20:25 manybubbles: restarting elastic1021 to pick up new plugins
20:21 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias to 1.25wmf7
20:13 logmsgbot: reedy Finished scap: testwiki to 1.25wmf8 and build l10n cache (duration: 53m 57s)
19:37 hoo: Made myself oauthadmin on mediawikiwiki
19:19 logmsgbot: reedy Started scap: testwiki to 1.25wmf8 and build l10n cache
19:05 mutante: installing package upgrades on bast1001 (incl. PHP version)
19:04 mutante: installing package upgrades on iron
18:38 YuviPanda: turned off yurik's zerosms cronjob on stat1002 (already discussed with him, he was ok with it being stopped until he could find time to fix it)
17:58 _joe_: gracefulling apache on problematic API hosts
17:05 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 04s)
16:51 logmsgbot: anomie Synchronized php-1.25wmf7/extensions/SecurePoll/: SWAT: SecurePoll fix for jump-text and title on create/edit gerrit:172718 (for real this time) (duration: 00m 09s)
16:48 logmsgbot: anomie Finished scap: SWAT: SecurePoll fix for jump-text and title on create/edit gerrit:172718 (duration: 22m 13s)
16:26 logmsgbot: anomie Started scap: SWAT: SecurePoll fix for jump-text and title on create/edit gerrit:172718
16:25 logmsgbot: anomie Synchronized php-1.25wmf7/extensions/MultimediaViewer/: SWAT: Backport MediaViewer options menu layout fix gerrit:172737 (duration: 00m 09s)
16:04 logmsgbot: anomie Synchronized wmf-config: SWAT: Set different ImageMetrics sampling factor for logged-in users gerrit:172720 (duration: 00m 12s)
16:01 logmsgbot: anomie Synchronized wmf-config/Wikibase.php: SWAT: Add "featured portal" badge (Q17580674) gerrit:172729 (duration: 00m 10s)
14:55 logmsgbot: oblivian Synchronized wmf-config/CommonSettings.php: Open HHVM to 20% of anons (duration: 00m 06s)
14:27 manybubbles: restarting elastic1016 to pick up new plugins.... half way done
14:24 _joe_: load test on hhvm done
13:55 godog: rolling reload of swift on ms-be1* to pick up statsd changes
13:32 godog: rolling reload of swift on ms-fe1* to pick up statsd changes
10:28 _joe_: repooling mw1189 with a reduced hhvm thread count for testing (puppet disabled, as well)
10:16 _joe_: depooling mw1189 from the api pool for reimaging
08:17 _joe_: stress testing a group of HHVM servers in anticipation for the move to 20% of traffic
03:35 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Nov 12 03:35:17 UTC 2014 (duration 35m 16s)
02:15 logmsgbot: LocalisationUpdate completed (1.25wmf7) at 2014-11-12 02:15:49+00:00
02:09 logmsgbot: LocalisationUpdate completed (1.25wmf6) at 2014-11-12 02:09:14+00:00

November 11

17:12 cscott: removed old ocg cronjobs on ocg100x; see https://bugzilla.wikimedia.org/show_bug.cgi?id=73166
16:48 logmsgbot: reedy Synchronized wmf-config: Use Texvc filter if available (duration: 00m 15s)
14:23 logmsgbot: reedy Synchronized private/PrivateSettings.php: Add $wmgVERPsecret for BounceHandler (duration: 00m 14s)
14:09 logmsgbot: reedy Synchronized php-1.25wmf7/extensions/BounceHandler/: (no message) (duration: 00m 15s)
14:04 logmsgbot: reedy Synchronized php-1.25wmf6/vendor/: (no message) (duration: 00m 15s)
14:04 logmsgbot: reedy Synchronized php-1.25wmf6/extensions/BounceHandler/: (no message) (duration: 00m 14s)
13:41 logmsgbot: reedy Synchronized wmf-config: (no message) (duration: 00m 14s)
13:35 godog: rolling reload on ms-be2* to pick up statsd changes
13:12 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non wikipedias to 1.25wmf7
13:11 logmsgbot: reedy Purged l10n cache for 1.25wmf5
13:08 YuviPanda: deleting tons of junk data generated by interaction between txstatsd and the labs graphite archiver on labmon1001
04:29 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Nov 11 04:29:32 UTC 2014 (duration 29m 31s)
02:41 logmsgbot: LocalisationUpdate completed (1.25wmf7) at 2014-11-11 02:40:58+00:00
02:28 logmsgbot: LocalisationUpdate completed (1.25wmf6) at 2014-11-11 02:28:21+00:00
00:21 logmsgbot: catrope Synchronized php-1.25wmf7/extensions/Flow: SWAT (duration: 00m 05s)
00:21 logmsgbot: catrope Synchronized php-1.25wmf7/extensions/VisualEditor: SWAT (duration: 00m 04s)

November 10

23:22 paravoid: reprepro: include src:libmaxminddb, src:geoipupdate for precise/trusty
22:14 cscott: updated Parsoid to version b61475196
21:49 cscott: updated OCG to version d9855961b18f550f62c0b20da70f95847a215805
21:36 mutante: powercycling frozen stat1002
18:42 manybubbles: restarting remaining elasticsearch boxes in sequence to pick up new plugins
18:30 godog: reboot db1017 to pick up an updated kernel
18:29 logmsgbot: ori Synchronized php-1.25wmf6/includes/ChangeTags.php: Iec9befeba: Hide HHVM tag on Special:{Contributions,RecentChanges,...} (duration: 00m 05s)
18:29 logmsgbot: ori Synchronized php-1.25wmf7/includes/ChangeTags.php: Iec9befeba: Hide HHVM tag on Special:{Contributions,RecentChanges,...} (duration: 00m 06s)
17:52 manybubbles: restart elastic1002 to pick up new plugins
17:16 manybubbles: elastic1001 finished restarting. letting is soak up shards for a few minutes to make sure restart was ok. then we'll plow through the others
17:02 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 05s)
16:52 manybubbles: restarting elastic1001 to pick up new plugins.
16:50 manybubbles: deployed new versions of elasticsearch plugins to fix regex querying
16:48 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: tocuh (duration: 00m 14s)
16:03 logmsgbot: anomie Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable JPG thumbnail chaining on all wikis except commons gerrit:172254 (duration: 00m 09s)
16:01 logmsgbot: anomie Synchronized wmf-config/Wikibase.php: SWAT: Enable experimental Wikidata features on labs gerrit:172239 (duration: 00m 09s)
15:50 logmsgbot: reedy Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 14s)
11:54 logmsgbot: oblivian Synchronized wmf-config/CommonSettings.php: Open HHVM to 15% of anons (duration: 00m 06s)
10:06 _joe_: upgraded hhvm on the whole cluster
09:19 hashar: Restarting Jenkins to java 7
09:17 _joe_: upgrading hhvm across the fleet with new package with debug symbols
09:14 hashar: Jenkins: switching from Java 6 to Java 7 153764
09:02 _joe_: repooling mw1189 at reduced load
08:52 _joe_: dist-upgrading mw1189 to use the latest kernel available, then rebooting
08:32 YuviPanda: ran mklost+found on /srv/postgres for reducing cronspam
08:21 paravoid: force-rebooting ms-be2011, kernel "xfs stuck"
01:55 ori: depooled mw1114 after it became unresponsive, likely <https://phabricator.wikimedia.org/T1195>

November 9

23:44 hoo: Changed the email for a global account. Bug 73014.
21:56 _joe_: depooling mw1189 from the api pool, see https://phabricator.wikimedia.org/T1194
19:12 _joe_: restarted apache on mw1192, this time an hard restart
17:11 hoo: mw1192 stuck with almost no idle workers as most workers are in the "Gracefully finishing" state. Attempted to gracefully restart it, but that (to no surprise) didn't help.

November 8

20:17 Krinkle: Jenkins/Zuul was still stuck. Disconnected and relaunched slave agents on lanthanum and gallium. This fixed it (slaves in labs were fine).
20:01 Krinkle: Jenkins/Zuul appear stuck. Disconnect/Re-enable Gearman from Jenkins.
15:30 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: Gerrit I46b151ff: Reverting addition of Draft namespace to enwiki (duration: 00m 04s)
11:09 YuviPanda: ran makelost+found on /srv/postgres on labsdb1004 to kill cronspam
11:07 YuviPanda: ran makelost+found on /srv/postgres on labsdb1007 to kill cronspam

November 7

20:53 logmsgbot: ebernhardson Synchronized php-1.25wmf7/extensions/Flow: Bump flow submodule for bug 71858 (duration: 00m 08s)
20:33 logmsgbot: ori Synchronized php-1.25wmf6/includes/WebResponse.php: I569b2ebbc: Add WebResponse::getHeader() (duration: 00m 09s)
20:13 logmsgbot: ori Synchronized php-1.25wmf7/includes/WebResponse.php: I569b2ebbc: Add WebResponse::getHeader() (duration: 00m 07s)
20:03 YuviPanda: restarted gitblit on antimony
19:36 YuviPanda: upgraded php5-fss to 1.0-2 on virt1000 to prevent cronspam
19:36 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 04s)
16:27 godog: shut db1017 briefly for cmjohnson to look
14:46 _joe_: installing hhvm package built with full debug symbols on mw1114
09:00 logmsgbot: oblivian Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 07s)
08:00 _joe_: powercycled mw1169, console unresponsive, not responding to pings
07:52 _joe_: killed the master apache process on mw1191, stuck in a futex wait, restarted apache
07:44 _joe_: upgrading the hhvm appservers to the new package version, it seems stable enough
01:01 logmsgbot: reedy Synchronized php-1.25wmf7/extensions/Flow/: (no message) (duration: 00m 16s)
00:49 logmsgbot: reedy Synchronized php-1.25wmf6/includes/api/: (no message) (duration: 00m 14s)
00:36 logmsgbot: reedy Synchronized php-1.25wmf7/extensions/MobileFrontend/: (no message) (duration: 00m 16s)
00:27 logmsgbot: reedy Synchronized php-1.25wmf7/extensions/VisualEditor/: (no message) (duration: 00m 15s)
00:26 logmsgbot: reedy Synchronized php-1.25wmf6/extensions/GeoData: (no message) (duration: 00m 14s)
00:03 Reedy: running foreachwikiindblist wikidataclient.dblist extensions/Wikidata/extensions/Wikibase/lib/maintenance/populateSitesTable.php --strip-protocols
00:00 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: touch (duration: 00m 14s)
00:00 logmsgbot: reedy Synchronized langlist: mai (duration: 00m 14s)

November 6

23:26 bd808: deleted corrupt mediawki/core clone in workspace/mwext-MobileFrontend-qunit-mobile on gallium
23:24 bd808: Killed 3 hung /usr/local/bin/logstash_optimize_index.sh processes on logstash1002
23:22 bd808: restarted logstash on logstash1002
23:21 bd808: restarted logstash on logstash1003
23:02 bd808: restarted logstash on logstash1001 for the usual reason (no events making it to elasticsearch)
22:19 subbu: updated parsoid to d23d2be6 (+ a hotfix to the production localsettings config file)
21:26 ottomata: added Range header field to varnishkafka webrequest logs
19:45 andrewbogott: restarted ntp on labstore1001
19:33 manybubbles: manybubbles is done with SWAT
16:11 logmsgbot: manybubbles Synchronized php-1.25wmf7/extensions/MultimediaViewer/: SWAT revert layout changes (duration: 00m 06s)
16:02 logmsgbot: manybubbles Synchronized wmf-config/: SWAT deploy some beta configs. Should be noop. (duration: 00m 04s)
15:49 _joe_: load-testing hhvm, in particular the servers with the new package
15:43 _joe_: upgrading mw1031,mw1032 to the new package, no crashes seeen since reinstall
15:24 manybubbles: finished with performance testing for cirrus - new servers look like way way more than enough power
15:01 manybubbles: dewiki is fine. trying enwiki.
14:57 manybubbles: performance test for zhwiki was good. trying dewiki
14:55 manybubbles: running performance test for Cirrus taking zhwiki
12:09 akosiaris: Depool wtp1001, wtp1003-1006 for trusty upgrade
10:07 _joe_: temporary raising weight of mw1018 and 1030 in pybal to load-test them and check for crashes
09:53 _joe_: installing the new hhvm package on mw1030 and mw1018 in order to test for stability
02:05 awight: CRM: drush vset maintenance_mode 1
01:21 Tim: restarted gmond on mw1018 and mw1031
01:06 mutante: git-sync-upstream on deployment-salt for beta puppetmaster
00:56 awight: disabling all queue consumers.
00:28 logmsgbot: maxsem Synchronized php-1.25wmf6/extensions/MobileFrontend/: (no message) (duration: 00m 04s)
00:24 logmsgbot: maxsem Synchronized php-1.25wmf6/extensions/MobileFrontend/: (no message) (duration: 00m 07s)
00:20 logmsgbot: maxsem Synchronized php-1.25wmf7/extensions/VisualEditor/: SWAT (duration: 00m 07s)
00:18 logmsgbot: maxsem Synchronized php-1.25wmf6/extensions/MobileFrontend/: SWAT (duration: 00m 04s)
00:18 logmsgbot: maxsem Synchronized php-1.25wmf6/extensions/Flow/: SWAT (duration: 00m 05s)
00:13 andrewbogott: ocg1001 is depressingly tiny and will probably keeping complaining about disk space until it's rebuilt
00:09 andrewbogott: cleaned up some log files on ocg1001 and reduced logrotations to 7.

November 5

22:02 ejegg: updated fraud scoring
21:24 subbu: redployed parsoid deploy sha 66befe47 (with the right bunyan log level that unbreaks VE)
21:13 subbu: deployed parsoid version 978623eb
21:05 Reedy: Running foreachwikiindblist wikidataclient.dblist extensions/Wikidata/extensions/Wikibase/lib/maintenance/populateSitesTable.php --strip-protocols
21:02 logmsgbot: reedy Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 15s)
21:00 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: maiwiki
21:00 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: maiwiki (duration: 00m 14s)
20:59 logmsgbot: reedy Synchronized database lists: maiwiki (duration: 00m 14s)
20:56 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: maiwiki
20:56 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: maiwiki (duration: 00m 15s)
20:55 logmsgbot: reedy Synchronized database lists: maiwiki (duration: 00m 18s)
20:49 awight: turning off CiviMail activity record for each TY
20:45 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 17s)
20:39 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.25wmf7
20:37 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias to 1.25wmf6
20:34 logmsgbot: reedy Finished scap: testwiki to 1.25wmf7, build l10n cache (duration: 45m 03s)
19:49 logmsgbot: reedy Started scap: testwiki to 1.25wmf7, build l10n cache
19:48 logmsgbot: reedy scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki="fawiki" --list-file="/srv/mediawiki-staging/wmf-config/extension-list" --output="/tmp/tmp.37qNnawZ9J" --verbose' returned non-zero exit status 1 (duration: 00m 13s)
19:47 logmsgbot: reedy Started scap: testwiki to 1.25wmf7, build l10n cache
19:47 logmsgbot: reedy scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki="fawiki" --list-file="/srv/mediawiki-staging/wmf-config/extension-list" --output="/tmp/tmp.4wTY29z5Gg" ' returned non-zero exit status 1 (duration: 01m 08s)
19:46 logmsgbot: reedy Started scap: testwiki to 1.25wmf7, build l10n cache
19:17 andrewbogott: removed libvips-dev and libvips-tools from our custom repo for Trusty. The default packages seem to work fine.
18:30 andrewbogott: restarting icinga on neon
18:10 awight: disabled TY job
18:02 ^d: elastic1022 unbanned from allocation since it has a network cable again
17:14 logmsgbot: demon Synchronized wmf-config/PoolCounterSettings-eqiad.php: (no message) (duration: 00m 06s)
17:02 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: frwiki getting cirrusy search (duration: 00m 05s)
10:47 _joe_: installed hhvm 3.3.0-20140925+wmf4 on osmium for testing.
09:11 akosiaris: depool wtp1002, wtp1007-wtp1012
09:09 akosiaris: repool wtp1013,wtp1014,wtp1015,wtp1016,wtp1017
07:18 ori: rolled back cluster:appserver_hhvm to version 3.3.0-20140925+wmf3 of hhvm package
06:29 akosiaris: depool wtp1013, wtp1014, wtp1015, wtp1016, wtp1023 for trusty reinstallation
05:53 ori: ran: salt -G php:hhvm cmd.run 'restart hhvm'
05:26 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: $wgPercentHHVM: back to 5% (duration: 00m 11s)
03:21 ^d|voted: restarted lucene-search-2 on search1019: it'd been timing out for a few days and filled disk with log files.
02:25 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: If866e9caf: $wgPercentHHVM: 5 => 10, to test https://phabricator.wikimedia.org/T820#18870 (duration: 00m 04s)
01:21 logmsgbot: ori Synchronized php-1.25wmf5/extensions/MobileFrontend: Ic82ba72b98: Update MobileFrontend for cherry-picks (duration: 00m 04s)
01:21 logmsgbot: ori Synchronized php-1.25wmf6/extensions/MobileFrontend: Ic26f56c0d: Update MobileFrontend for cherry-picks (duration: 00m 05s)
01:12 ^d|voted: elastic1022: banned from allocation since its unreachable. just in case it starts flapping.
01:11 mutante: elatic1022 - eth0: <NO-CARRIER
01:07 ori: upgrading HHVM app servers to 3.3.0+dfsg1-1+wm2
01:02 mutante: powercycling elastic1022
00:45 ^d|voted: elasticsearch: rebuilding all cirrus indexes for all wikis from a screen on terbium, going to take awhile. should be boring, but if causing problems kill it first and then find me.
00:24 logmsgbot: demon Synchronized php-1.25wmf6/includes/parser/Parser.php: (no message) (duration: 00m 04s)
00:23 logmsgbot: demon Synchronized php-1.25wmf6/includes/parser/CoreTagHooks.php: (no message) (duration: 00m 04s)
00:23 logmsgbot: demon Synchronized php-1.25wmf5/includes/parser/Parser.php: (no message) (duration: 00m 04s)
00:23 logmsgbot: demon Synchronized php-1.25wmf5/includes/parser/CoreTagHooks.php: (no message) (duration: 00m 05s)
00:22 logmsgbot: demon Synchronized php-1.25wmf6/extensions/CirrusSearch/maintenance/updateOneSearchIndexConfig.php: (no message) (duration: 00m 04s)
00:22 logmsgbot: demon Synchronized php-1.25wmf5/extensions/CirrusSearch/maintenance/updateOneSearchIndexConfig.php: (no message) (duration: 00m 04s)
00:04 logmsgbot: demon Synchronized wmf-config/CirrusSearch-common.php: (no message) (duration: 00m 04s)
00:03 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 07s)

November 4

19:22 cmjohnson: rebooting wtp1023
17:21 ejegg: updated crm from b8a1fa98b5d9252d708090c99b61fd22ebe8d2be to e9e81a828d50e8bddf98eae699c925e09b25927b
16:53 akosiaris: repool wtp1017,wtp1018,wtp1019,wtp1020
16:50 hashar: restarting Zuul/Jenkins entirely
16:45 logmsgbot: manybubbles Synchronized php-1.25wmf6/extensions/UniversalLanguageSelector/: SWAT update uls (duration: 00m 04s)
16:43 logmsgbot: manybubbles Synchronized php-1.25wmf5/extensions/UniversalLanguageSelector/: SWAT update uls (duration: 00m 04s)
16:33 hashar: Shutting down Jenkins to remove a deadlock :-(
16:26 hashar: Jenkins restarting Gearman client
16:24 hashar: Zuul on hold, waiting for beta cluster related jobs to complete
16:21 hashar: Jenkins: disconnecting/reconnecting gearman client , killing deployment-bastion.eqiad slave in an attempt to remove a deadlock bug 70597
14:06 akosiaris: upgrading kernels on amssq*
13:59 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 14s)
13:58 Reedy: graceful apache on mw1193
13:57 Reedy: graceful apache on mw1144
13:51 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 15s)
13:40 akosiaris: depool wtp1017, wtp1018, wtp1019, wtp1020 from trusty reinstall
13:39 akosiaris: upgrading apache2 throught the mw cluster
13:24 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 19s)
13:14 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non wikipedias to 1.25wmf6
13:03 logmsgbot: reedy Purged l10n cache for 1.25wmf4
12:41 akosiaris: repooled wtp1021,wtp1022,wtp1023
10:42 akosiaris: depooled wtp1021,wtp1022,wtp1023 for re-installation with trusty
06:31 springle: force logrotate ocg1001
03:32 springle: restart db2017
00:12 logmsgbot: catrope Synchronized php-1.25wmf6/extensions/MultimediaViewer: SWAT (duration: 00m 03s)
00:12 logmsgbot: catrope Synchronized php-1.25wmf6/extensions/MobileFrontend: SWAT (duration: 00m 04s)
00:12 logmsgbot: catrope Synchronized php-1.25wmf6/extensions/VisualEditor: SWAT (duration: 00m 04s)
00:09 logmsgbot: catrope Synchronized php-1.25wmf5/extensions/VisualEditor: SWAT (duration: 00m 04s)
00:05 logmsgbot: catrope Synchronized wmf-config/InitialiseSettings.php: Flow on officewiki and mw.org research page (duration: 00m 04s)

November 3

23:34 bd808: Changed elasticsearch template for logstash to use "doc_values" for raw fields. http://www.elasticsearch.org/blog/disk-based-field-data-a-k-a-doc-values/
23:01 cscott: reconfigured OCG logstash path to use bunyan. The _type field is currently missing (used to be "OfflineContentGenerator"). Will fix tomorrow.
22:32 cscott: updated OCG to version 5834af97ae80382f3368dc61b9d119cef0fe129b
21:55 ejegg: enabled recurring globalcollect processor
20:49 logmsgbot: maxsem Synchronized wmf-config/mobile.php: https://gerrit.wikimedia.org/r/#/c/170453/ (duration: 00m 03s)
20:23 logmsgbot: maxsem Synchronized wmf-config/InitialiseSettings.php: Enable WikiGrok on enwiki (duration: 00m 04s)
19:51 logmsgbot: maxsem Synchronized wmf-config/InitialiseSettings.php: Enable WikiGrok on test and test2 (duration: 00m 04s)
19:43 logmsgbot: maxsem Finished scap: Build localization cache for WikiGrok (duration: 35m 09s)
19:08 logmsgbot: maxsem Started scap: Build localization cache for WikiGrok
18:55 awight: restarting fredge consumer
18:09 awight: restarting donations queue consumer
18:09 awight: update crm from f47ed6f7e55946388db1dde787ca458c27a57c5a to b8a1fa98b5d9252d708090c99b61fd22ebe8d2be
16:57 akosiaris: repool wtp1024 at regular weight
16:34 _joe_: rolling-restarting hhvm appservers
16:25 godog: reboot ms-be2007, disk replaced but no corresponding raid0 LD
16:22 andrewbogott: added yuvi to 'Ops' ldap group
16:03 logmsgbot: anomie Synchronized docroot and w: (no message) (duration: 00m 10s)
14:38 akosiaris: wtp1024 re-installed as trusty
14:38 akosiaris: repool wtp1024 with a weight of 1 instead of 15 for now
13:18 akosiaris: depool wtp1024.eqiad.wmnet in preparation for reimaging to trusty
11:26 akosiaris: disable puppet on labsdb1004, labsdb1005 for postgresql reinitialization

November 2

20:41 logmsgbot: hoo Synchronized php-1.25wmf6/extensions/CentralAuth/: Fix LocalPageMoveJob (duration: 00m 08s)
20:41 logmsgbot: hoo Synchronized php-1.25wmf5/extensions/CentralAuth/: Fix LocalPageMoveJob (duration: 00m 09s)

November 1

11:07 logmsgbot: oblivian Synchronized wmf-config/CommonSettings.php: re-set hhvm to 5% of users (duration: 00m 05s)

October 31

20:46 logmsgbot: aaron Synchronized php-1.25wmf5/includes/GlobalFunctions.php: 721435c3a6c8f7c728d3fa8ec34abb0f2ef7543d (duration: 00m 07s)
20:36 logmsgbot: aaron Synchronized php-1.25wmf6/includes/GlobalFunctions.php: 04c35b2ca42d7a186278882763eb853552d8441c (duration: 00m 04s)
18:36 ejegg: disabled recurring globalcollect
18:03 logmsgbot: maxsem Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/170358 (duration: 00m 04s)
15:25 logmsgbot: demon Synchronized wmf-config/CirrusSearch-production.php: (no message) (duration: 00m 04s)
14:59 logmsgbot: demon Synchronized php-1.25wmf6/extensions/CirrusSearch: (no message) (duration: 00m 04s)
14:59 logmsgbot: demon Synchronized php-1.25wmf5/extensions/CirrusSearch: (no message) (duration: 00m 04s)
14:56 _joe_: rotated logs on ocg1001, restarted both ocg and rsyslog
14:23 akosiaris: update DNS/NTP settings, add codfw on nas1001-a,b
13:27 manybubbles: reenable was uneventful. good news.
13:25 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: reenable cirrus everywhere where it has been after the outage has passed (duration: 00m 03s)
12:41 manybubbles: reenabled cirrus as betafeature - no spike in error logs
12:41 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: reenable cirrus as betafeature everywhere (duration: 00m 05s)
12:37 manybubbles: cirrus is working on test2wiki - we look to be recovered save for some loss of redundancy
12:36 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: reenable cirrus on testwiki (duration: 00m 04s)
12:32 logmsgbot: manybubbles Synchronized wmf-config/: Disable Cirrus accelerated regexes as we *think* they might be causing outages (duration: 00m 04s)
12:31 manybubbles: restart of elasticsearch nodes got them back to responsive. Cluster isn't fully healed yet but we're better then we were. Still not sure how we got this way
12:26 manybubbles: restarting all elasticsearch boxes in quick sequence. when I try restarting a frozen box another one freezes up (probably an evil request being retried on it after its buddy went down).
11:46 manybubbles: heap dumps aren't happening. Even with the config to dump them on oom errors. Restarting Elasticsearch nodes to get us back to stable and going to have to investigate from another direction.
11:30 manybubbles: restarting gmond on elasticsearch nodes so I can get a clearer picture of them
11:24 logmsgbot: oblivian Synchronized wmf-config/InitialiseSettings.php: ES is down, long live lsearchd (duration: 00m 09s)
10:52 godog: restarting elasticsearch on elastic1031, heap exhausted at 30G
01:14 springle: db1040 dberror spam is https://gerrit.wikimedia.org/r/#/c/169964/ only jobrunners affected, annoying but not critical

October 30

23:56 awight: update civicrm from 1f0dc2ce0ab84765c085cc0ee369a7a047c0d005 to f47ed6f7e55946388db1dde787ca458c27a57c5a
23:08 logmsgbot: demon Synchronized php-1.25wmf6/extensions/CirrusSearch: (no message) (duration: 00m 04s)
23:08 logmsgbot: demon Synchronized php-1.25wmf5/extensions/CirrusSearch: (no message) (duration: 00m 05s)
19:02 cmjohnson: powering off elastic1009-1002 to replace ssds
18:35 mutante: restarting nginx on toollabs webproxy
18:35 manybubbles: unbanning elastic1006 now that it is proplery configured
17:54 _joe_: syncronized downsizing to 5%
17:54 logmsgbot: oblivian Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 06s)
17:42 _joe_: rolling restarted hhvm appservers
17:38 hashar: Zuul seems to be happy. Reverted my lame patch to send Cache-Control headers since we have a cache breaker it is not needed.
17:21 bd808: 10.64.16.29 is db1040 in the s4 pool
17:18 bd808: "Connection error: Unknown error (10.64.16.29)" 1052 in last 5m; 2877 in last 15m
17:16 hashar: Upgrading Zuul to have the status page emit a Cache-Control header bug 72766 wmf-deploy-20141030-1..wmf-deploy-20141030-2
17:11 bd808: Upgraded kibana to v3.1.1 again. Better testing now that logstash is working.
17:01 bd808: Logs on logstash1003 showed "Failed to flush outgoing items <Errno::EBADF: Bad file descriptor - Bad file descriptor>" on shutdown. Maybe something not quite right about elasticsearch_http plugin?
17:00 logmsgbot: awight Synchronized php-1.25wmf6/includes/specials/SpecialUpload.php: Parse 'upload_source_url' message on SpecialUpload (duration: 00m 10s)
16:59 bd808: restarted logstash on logstash1003. No events logged since 00:00Z
16:58 logmsgbot: awight Synchronized php-1.25wmf5/includes/specials/SpecialUpload.php: Parse 'upload_source_url' message on SpecialUpload (duration: 00m 11s)
16:58 bd808: restarted logstash on logstash1002. No events logged since 00:00Z
16:58 bd808: restarted logstash on logstash1001. No events logged since 00:00Z
16:54 akosiaris: uploaded php5_5.3.10-1ubuntu3.15+wmf1 on apt.wikimedia.org
16:46 bd808: Reverted kibana to e317bc6
16:44 logmsgbot: oblivian Synchronized wmf-config/CommonSettings.php: Serving 15% of anons with HHVM (ludicrous speed!) (duration: 00m 16s)
16:38 bd808: Upgraded kibana to v3.1.1 via Trebuchet
16:38 hashar: Zuul status page is freezing because the status.json is being cached :-/
16:31 logmsgbot: awight Synchronized php-1.25wmf6/extensions/CentralNotice: push CentralNotice updates (duration: 00m 09s)
16:28 logmsgbot: awight Synchronized php-1.25wmf5/extensions/CentralNotice: push CentralNotice updates (duration: 00m 11s)
16:22 manybubbles: moving shards off of elastic1003 and elastic1006 so they can be restarted. elastic1003 need hyperthreading and elastic1006 needs noatime.
16:17 cmjohnson: powering off elastic1015-16 to replace ssds
16:04 hashar: restarted Zuul with upgraded version ( wmf-deploy-20140924-1..wmf-deploy-20141030-1 )
16:03 hashar: Stopping zuul
16:00 logmsgbot: hoo Synchronized wmf-config/CommonSettings.php: Fix oauthadmin (duration: 00m 09s)
15:43 hashar: Going to upgrade Zuul and monitor the result over the next hour.
15:39 ottomata: starting to reimage mw1032
15:29 logmsgbot: oblivian Synchronized wmf-config/CommonSettings.php: Serving 10% of anons with HHVM (duration: 00m 06s)
15:22 logmsgbot: reedy Synchronized docroot and w: Fix dbtree caching (duration: 00m 15s)
15:13 akosiaris: upgrading PHP on mw1113 to php5_5.3.10-1ubuntu3.15+wmf1
15:07 manybubbles: moving shards off of elastic1015 and elastic1016 so we can replace their hard drives/turn on hyper threading
15:07 logmsgbot: marktraceur Synchronized php-1.25wmf6/extensions/Wikidata/: [SWAT] [wmf6] Fix edit link for aliases (duration: 00m 12s)
14:37 cmjohnson: powering down elastic1003-1006 to replace ssds
14:33 _joe_: pooling mw1031/2 in the hhvm appservers pool
12:51 _joe_: rebooting mw1030 and mw1031 to use the updated kernel
12:48 akosiaris: enabled puppet on uranium
11:38 _joe_: depooling mw1030 and mw1031 for reimaging as hhvm appservers
10:15 _joe_: load test ended
09:48 _joe_: load testing the hhvm appserver pool as well
08:17 _joe_: powercycling mw1189, enabling hyperthreading
08:04 _joe_: doing the same with mw1189, to see how different appserver generations respond
07:25 _joe_: raising the weight of mw1114 in the api pool to test the throughput it can withstand
04:47 ori: enabled heap profiling on mw1189

October 29

23:42 ejegg: updated tool from 19928683a8112e9aadd71ba47f199885ba517a02 to 419fb7aa32c6d0776056968378e358ee01985565
23:38 logmsgbot: maxsem Synchronized php-1.25wmf6/extensions/MobileFrontend/: (no message) (duration: 00m 07s)
23:35 logmsgbot: maxsem Synchronized php-1.25wmf5/extensions/MobileFrontend/: (no message) (duration: 00m 04s)
23:13 logmsgbot: catrope Synchronized php-1.25wmf6/extensions/VisualEditor: SWAT (duration: 00m 04s)
23:00 mutante: restarting nginx on cp1044
22:11 AaronSchulz: Re-running setZoneAccess.php for swift
22:04 Krinkle: git-deploy: Deploying integration/slave-scripts a6a23ac1ec
20:28 subbu: reverted parsoid to version 617e9e61b625f25d79dfaab08830c396537be632 (due to stuck processes)
20:16 logmsgbot: reedy Synchronized wmf-config/mc-labs.php: noop for prod (duration: 00m 17s)
20:07 arlolra: updated Parsoid to version e5bc6da6e347a65cedf24a2284e51af881dce599
19:45 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 16s)
19:39 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 16s)
19:26 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 15s)
19:17 ori: upgraded HHVM to 3.3.0+dfsg1-1+wm1
18:58 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.25wmf6
18:57 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias to 1.25wmf5
18:47 logmsgbot: reedy Finished scap: testwiki to 1.25wmf6 and build l10n cache (duration: 28m 30s)
18:18 logmsgbot: reedy Started scap: testwiki to 1.25wmf6 and build l10n cache
17:24 cmjohnson: shutting down to replace ssds in elastic1002,1007,1014
17:07 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: I8dd62e2cc: Re-enable hhvm beta feature on Wikidata (duration: 00m 06s)
16:20 manybubbles: elastic101[7-9] look good to me - adding them to the cluster
16:17 manybubbles: shutting down elasticsearch on elastic1002 - its empty and ready to have its disk upgraded/hyper threading enabled
16:05 manybubbles: ignore my last log message about 1017 - typod
16:05 manybubbles: shutting down elasticsearch on elastic1007 - its empty and ready to have its disk upgraded/hyper threading enabled
16:04 manybubbles: shutting down elasticsearch on elastic1014 - its empty and ready to have its disk upgraded/hyper threading enabled
16:03 manybubbles: shutting down elasticsearch on elastic1017 - its empty and ready to have its disk upgraded/hyper threading enabled
15:39 manybubbles: start moving shards back to elastic1001 and elastic1008 now that they are up with hyperthreading on
15:37 Reedy: deleted php-1.24wmf21 from mediawiki-installation
15:36 Reedy: deleted php-1.24wmf20 from mediawiki-installation
15:35 Reedy: deleted php-1.24wmf19 from mediawiki-installation
15:35 akosiaris: uploaded apertium-apy_0.1+svn~57689-1 on apt.wikimedia.org
15:23 manybubbles: unbanned elastic1013 now that it is back with hyper threading on
15:21 logmsgbot: reedy Purged l10n cache for 1.25wmf3
15:20 logmsgbot: reedy Purged l10n cache for 1.25wmf2
15:19 bd808: Restarted logstash on logstash1002 to fix OCG and hadoop log events not being recorded
15:15 bd808: Restarted logstash on logstash1001. No MW events were being added to the index.
15:10 logmsgbot: anomie Synchronized wmf-config/throttle.php: SWAT: Raise account creation throttle at cawiki temporarily gerrit:169708 (duration: 00m 09s)
15:07 logmsgbot: anomie Synchronized php-1.25wmf5/extensions/Wikidata: SWAT: Fix WikiData "add links" widget JS error gerrit:169700 (duration: 00m 15s)
15:07 Reedy: Killed old (pre 1.25) l10nupdate cache dirs from tin:/var/lib/l10nupdate
15:00 manybubbles: started moving shard off of elastic1001, elastic1008, and elastic1013 so we can bounce them to enable hyper threading
14:55 manybubbles: started rolling shards back to elastic1001, elastic1008, and elastic1013 after hard drive upgrade
14:21 ottomata: set request.required.acks = 2 for all varnishkafkas
13:22 manybubbles: lowered replication on logstash's template for new indexes from 3 way to 2 way
13:20 logmsgbot: demon Synchronized wmf-config/lucene-production.php: unbreak lsearchd for commons, enwikitionary, etc (duration: 00m 04s)
13:11 manybubbles: lowered redundancy on logstash from 3 way to 2 way
13:01 cmjohnson: powering down/replacing elastic1017 and elastic1018
12:59 cmjohnson: disabling puppet on elastic1017 and 1018
12:01 cmjohnson: elastic1001, elastic1008 and elastic1013 powering down to replace ssds RT7779
12:01 springle: xtrabackup clone db1007 to db2029
07:16 ori: repooled mw1189 w/patched hhvm (<https://phabricator.wikimedia.org/T820#16428>)
03:39 ori: upgraded mw1114 to custom package with patch from https://phabricator.wikimedia.org/T820#16428 applied

October 28

23:12 logmsgbot: demon Synchronized php-1.25wmf5/extensions/Wikidata: (no message) (duration: 00m 10s)
23:05 _joe_: removed stale heap profile files from /run/hhvm on mw1114
23:02 logmsgbot: demon Synchronized wmf-config/CommonSettings.php: extension distributor stuffs (duration: 00m 05s)
22:09 ejegg: updated crm from ffa543cab3eb508fa38b94c6de2643d168b0d507 to 1f0dc2ce0ab84765c085cc0ee369a7a047c0d005
21:05 awight: reverted payments, from 647d1eb7d8cccb73fabf5ffded9f713d24576c37 to e3d235f881282120409e1a6ed1a3908ce9a63c26
21:02 hashar: Zuul back in action.
20:54 hashar: Zuul deadlocked again. Restarting Gearman plugin on Jenkins
20:53 awight: updated payments from 525988487d6bbd08ddad50badd88e34e34104292 to 647d1eb7d8cccb73fabf5ffded9f713d24576c37
20:29 manybubbles: removed /etc/elasticsearch/*.dpkg-dist fromg logstash machines - that was breaking logging for some reason. magic.
20:24 manybubbles: disabling puppet on logstash1003 and trying to run elasticserach by hand to learn more about why its borked.
19:42 andrewbogott: deleting unused labs projects: commons-dev, echo, farsi-wikitest
19:33 bd808: Elasticsearch not recovering indices at all on logstash1003 and no logging output
19:11 logmsgbot: reedy Synchronized php-1.25wmf5/extensions/CirrusSearch/: (no message) (duration: 00m 15s)
19:07 Reedy: frwikibooks collation updated
19:06 Reedy: Running mwscript updateCollation.php --wiki=frwikibooks --previous-collation=uppercase
19:05 logmsgbot: reedy Synchronized wmf-config/: All of the config changes! (duration: 00m 14s)
18:53 logmsgbot: reedy Synchronized wmf-config/: Bump cache epoch for Wikidata (duration: 00m 14s)
18:52 logmsgbot: reedy Finished scap: Split Cite extension, scap to build l10n cache for CiteThisPage (duration: 33m 52s)
18:29 bd808|LUNCH: restarted elasticsearch node on logstash1003
18:18 logmsgbot: reedy Started scap: Split Cite extension, scap to build l10n cache for CiteThisPage
18:11 logmsgbot: reedy Synchronized wmf-config/: Set = true for Italian Wikipedia in November-December 2014 (duration: 00m 14s)
18:05 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non wikipedias to 1.25wmf5
17:27 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: I56082795: Modify $wmgAddWikiNotify for use by notifyNewProjects (duration: 00m 05s)
16:59 bd808: restarted logstash on logstash1002 to try and get gelf input events into kibana again
16:58 bd808: disk utilization on logstash100[123] greater than 80%
16:57 cscott: no logs for ocg/parsoid on logstash since 2014-10-27T18:50:46.104Z/2014-10-27T18:50:45.977Z (respectively)
16:57 logmsgbot: andrew Synchronized php-1.25wmf5/extensions/OpenStackManager: (no message) (duration: 00m 03s)
16:56 bd808: No new logs in /var/log/elasticsearch for logstash100[123] since Sep 30 06:25
16:55 logmsgbot: andrew Synchronized php-1.25wmf5/extensions/OpenStackManager: (no message) (duration: 00m 02s)
16:44 logmsgbot: reedy Synchronized php-1.25wmf4/extensions/OpenStackManager: (no message) (duration: 00m 14s)
15:47 andrewbogott: running sync-common on virt1000
15:47 logmsgbot: manybubbles Synchronized php-1.25wmf4/extensions/OpenStackManager/: SWAT update openstackmanager (duration: 00m 04s)
15:45 logmsgbot: manybubbles Synchronized php-1.25wmf5/extensions/OpenStackManager/: SWAT update openstackmanager (duration: 00m 04s)
15:24 logmsgbot: manybubbles Synchronized wmf-config/: SWAT cirrus regex queues too small? (duration: 00m 05s)
15:11 logmsgbot: manybubbles Synchronized php-1.25wmf5/extensions/Wikidata/: SWAT update wikidata (duration: 00m 10s)
15:03 logmsgbot: manybubbles Synchronized wmf-config/: SWAT cirrus config updates - (hopefully) faster regexes (duration: 00m 06s)
14:51 godog: rolling-restart of eqiad ms-fe* after https://gerrit.wikimedia.org/r/#/c/167310/
14:04 godog: reload swift frontend in eqiad after password rotation
14:04 logmsgbot: demon Synchronized wmf-config/PrivateSettings.php: (no message) (duration: 00m 04s)
13:48 logmsgbot: manybubbles Synchronized php-1.25wmf5/extensions/CirrusSearch/: (no message) (duration: 00m 05s)
13:47 logmsgbot: manybubbles Synchronized php-1.25wmf4/extensions/CirrusSearch/: (no message) (duration: 00m 11s)
01:01 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: Turn Cirrus back on basically everywhere. If Elasticsearch freaks out again just revert I73ae276e to get back to lsearchd again (duration: 00m 04s)
00:43 logmsgbot: ori Synchronized php-1.25wmf4/extensions/WikimediaEvents/WikimediaEventsHooks.php: I4adffaa26: Actually unset the HHVM cookie (duration: 00m 03s)
00:43 logmsgbot: ori Synchronized php-1.25wmf5/extensions/WikimediaEvents/WikimediaEventsHooks.php: I4adffaa26: Actually unset the HHVM cookie (duration: 00m 03s)
00:27 awight: reenabling recurring GlobalCollect job
00:07 awight: updated crm from 9bb50403616d80aa8d39a89ab59965f53e9e3f3d to ffa543cab3eb508fa38b94c6de2643d168b0d507

October 27

23:52 bd808: Restarted logstash service on logstash1001 because I was not seeing any events from MW make it into kibana
23:27 logmsgbot: maxsem Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/169229/ for reals now (duration: 00m 04s)
23:26 Reedy: restarted logstash on logstash1001
23:23 logmsgbot: maxsem Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/169229/ (duration: 00m 04s)
23:22 Tim: on mw1114: disabled puppet, enabled Eval.PerfPidMap, restarted hhvm
23:21 awight: updated crm from 5b395c37dc596736ecafceeb156221e3751bfe37 to 9bb50403616d80aa8d39a89ab59965f53e9e3f3d
23:21 awight: disabling recurring globalcollect job
23:20 logmsgbot: maxsem Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/168771/ (duration: 00m 04s)
23:17 logmsgbot: maxsem Synchronized php-1.25wmf4/extensions/VisualEditor/: (no message) (duration: 00m 04s)
23:16 logmsgbot: maxsem Synchronized php-1.25wmf4/extensions/MobileFrontend/: (no message) (duration: 00m 05s)
23:14 logmsgbot: maxsem Synchronized php-1.25wmf5/extensions/MobileFrontend/: (no message) (duration: 00m 04s)
23:14 logmsgbot: maxsem Synchronized php-1.25wmf5/extensions/VisualEditor/: (no message) (duration: 00m 05s)
23:14 logmsgbot: maxsem Synchronized php-1.25wmf5/extensions/Wikidata/: (no message) (duration: 00m 10s)
23:13 logmsgbot: maxsem Synchronized php-1.25wmf3/extensions/Wikidata/: (no message) (duration: 00m 12s)
23:06 logmsgbot: maxsem Synchronized wmf-config/Wikibase.php: https://gerrit.wikimedia.org/r/#/c/169192/ (duration: 00m 04s)
22:58 awight: reenabling recurring globalcollect job
22:54 awight: rollback civicrm from 9bb50403616d80aa8d39a89ab59965f53e9e3f3d to 5b395c37dc596736ecafceeb156221e3751bfe37
22:53 awight: updated civicrm from 5b395c37dc596736ecafceeb156221e3751bfe37 to 9bb50403616d80aa8d39a89ab59965f53e9e3f3d
22:50 logmsgbot: aaron Synchronized wmf-config/flaggedrevs.php: Removed $wgFlaggedRevsProtectQuota for enwiki (duration: 00m 03s)
22:46 awight: disabling recurring GlobalCollect job
22:45 Tim: activated heap profiling on mw1114
22:21 AaronSchulz: Running cleanupBlocks.php on all wikis
22:18 logmsgbot: aaron Synchronized php-1.25wmf4/maintenance: 64fe61e0dbfea84d2bab4c17bf01f5dfdf5cc3b5 (duration: 00m 04s)
22:12 logmsgbot: aaron Synchronized wmf-config/CommonSettings.php: Stop GWT wgJobBackoffThrottling values from getting lost (duration: 00m 03s)
20:35 subbu: deployed parsoid sha 617e9e61
20:26 cscott: updated OCG to version 60b15d9985f881aadaa5fdf7c945298c3d7ebeac
20:10 logmsgbot: maxsem Synchronized php-1.25wmf4/extensions/GeoData: GeoData back to normal (duration: 00m 03s)
19:39 manybubbles: after restarting elasticsearch we expected to get memory errors again. no such luck so far....
18:57 manybubbles: completed restarting elasticsearch cluster. now it'll make a useful file on out of memory errors. raised the recovery throttling so it'll recover fast enough to cause oom errors
18:47 logmsgbot: maxsem Synchronized php-1.25wmf4/extensions/GeoData: live hack to disable geosearch (duration: 00m 04s)
18:37 manybubbles: note that this is a restart without waiting for the cluster to go green after each restart. I expect lots of whining from icinga. This will cause us to lose some updates but should otherwise be safe.
18:34 manybubbles: restarting elasticsearch servers to pick up new gc logging and to reset them into a "working" state so they can have their gc problem again and we can log it properly this time.
18:15 logmsgbot: aaron Synchronized wmf-config/CommonSettings.php: Remove obsolete flags (all of them) from $wgAntiLockFlags (duration: 00m 07s)
17:53 cmjohnson: replacing disk /dev/sdl slot 11 ms-be1013
17:37 _joe_: uploaded a version of jemalloc for trusty with --enable-prof
16:31 ^d: elasticsearch: temporarily raised node_concurrent_recoveries from 3 to 5.
15:32 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: Enable Cirrus as secondary everywhere, brings back GeoData (duration: 00m 04s)
15:08 manybubbles: Its unclear how much of the master going haywire is something that'll be fixed in elasticsearch 1.4. They've done a lot of work there on the cluster state communication.
15:07 manybubbles: for posterity 10/18 of the elasticsearch servers had got the point where they couldn't free any heap. Its currently not clear to me why they did that. This caused the cluster to basically collapse. The master node kept beind unable to communicate with anyone because everyone was pausing for multiple minutes between replies. The cluster handshaking couldn't cope with that and promptly got itself into a state where nodes were both part of the cluster and not part of the cluster at the same time. Thats bad.
15:03 manybubbles: restarting gmond on all elasticsearch systems because stats aren't updating properly in ganglia and usually that helps
15:02 manybubbles: restarted a bunch of the elasticsearch nodes that had their heap full. wasn't able to get a heap dump on any of them because they all froze while trying to get the heap dump.
14:32 ^d: elasticsearch: disabling replica allocation, less things moving about if we restart cluster
13:47 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: fall back to lsearchd for a bit (duration: 00m 05s)
13:41 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 05s)
13:29 manybubbles: restarted elasticsearch on elastic1017 - memory was totally full there
13:21 manybubbles: elastic1008 is logging gc issues. restarting it because that might help it
05:04 springle: forced logrotate ocg1001
03:36 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Oct 27 03:36:39 UTC 2014 (duration 36m 38s)
02:27 logmsgbot: LocalisationUpdate completed (1.25wmf5) at 2014-10-27 02:27:45+00:00
02:17 logmsgbot: LocalisationUpdate completed (1.25wmf4) at 2014-10-27 02:17:08+00:00

October 26

23:46 Krinkle: Force restarted Zuul
15:14 Krinkle: Jenkins/Zuul is stuck as of 20 hours ago
15:06 _joe_: restarted hhvm on mw1114, memory nearly exhausted
03:36 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Oct 26 03:36:20 UTC 2014 (duration 36m 19s)
02:25 logmsgbot: LocalisationUpdate completed (1.25wmf5) at 2014-10-26 02:25:47+00:00
02:15 logmsgbot: LocalisationUpdate completed (1.25wmf4) at 2014-10-26 02:15:12+00:00

October 25

22:49 paravoid: upgrading JunOS on cr1-ulsfo
22:32 paravoid: scheduling downtime for all ulsfo -lb- & cr1/2-ulsfo
21:30 logmsgbot: ori Synchronized php-1.25wmf5/extensions/CentralNotice/CentralNotice.hooks.php: Iee2072ac7: Make sure we declare globals before using them (duration: 00m 06s)
21:30 logmsgbot: ori Synchronized php-1.25wmf4/extensions/CentralNotice/CentralNotice.hooks.php: Iee2072ac7: Make sure we declare globals before using them (duration: 00m 06s)
20:41 bd808: updated logstash-* labs instances to salt minion 2014.1.11 (thanks for the ping apergos)
14:43 apergos: all active labs instances now running salt minion 2014.1.11 except for: logstash-* (have their own master), fabapi (pingable, can't ssh on), upload-wizard (running oneiric, not setting up a repo for that!). instances shutoff or w/ nova error were left untouched
03:46 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Oct 25 03:46:48 UTC 2014 (duration 46m 47s)
02:29 logmsgbot: LocalisationUpdate completed (1.25wmf5) at 2014-10-25 02:29:29+00:00
02:18 logmsgbot: LocalisationUpdate completed (1.25wmf4) at 2014-10-25 02:18:14+00:00
00:27 awight: updated DjangoBannerStats from cf5a875d49f4c4cf229d7f864a73d4c2f588ebf9 to a3038f133d64c737d3987bd1c37a987fd3003dd6

October 24

22:40 akosiaris: puppet disabled on uranium, do not enable
20:52 andrewbogott: revived virt1006 on a probationary basis. It's running compute but is disabled so new instances won't be scheduled there. I've moved a few test instances there to see how it behaves.
20:36 logmsgbot: andrew Synchronized wmf-config/wikitech.php: (no message) (duration: 00m 04s)
20:28 Reedy: sync-common on mw1088
20:23 mutante: mw1088 - gzipping core dump files, disabled core dumps, restarted apache
20:15 mutante: mw1088 - gzip other_vhosts_access.log.1 - Avail. 38G
20:15 Reedy: / full on mw1088 due to apache core dumps
20:09 Reedy: running sync-common on mw1041
20:04 mutante: powercycled mw1041
20:03 logmsgbot: reedy Synchronized php-1.25wmf5/extensions/SemanticForms/: noop for prod (duration: 00m 17s)
20:01 Reedy: mw1041 is down
20:01 Reedy: mw1088 has a full /
20:00 logmsgbot: reedy Synchronized php-1.25wmf4/extensions/SemanticForms/: noop for prod (duration: 00m 16s)
19:53 bblack: nickel's basically dead, uranium has been promoted to prod ganglia a little early for now
19:22 awight: updated payments from 6fa864d4aaa22b9f271de4bc662be68bb0b40b56 to 525988487d6bbd08ddad50badd88e34e34104292
18:55 ori: repooled mw1189 to do heap profiling on production api workload.
17:58 mutante: stat1001 - Duplicate declaration: Package[nodejs]
17:07 cmjohnson: getting ready to replace a failed disk on ganglia (server:nickel)...it will be offline for a few minutes
17:05 ejegg: updated dash from 58fda9403dd33e4d47238f119b6bb2b2905856b1 to 69c9330d6983873ffa9bb87fcd783be03382bdfc
15:50 awight: campaigns reenabled
15:40 awight: payments db migrated to 1.23 schema
15:37 awight: updated payments to REL1_23, 6fa864d4aaa22b9f271de4bc662be68bb0b40b56
15:18 awight: payments in maintenance mode
14:57 robh: francium going offline, ignore any icinga warning
14:37 andrewbogott: running sync-common on virt1000
14:36 logmsgbot: andrew Synchronized wmf-config/wikitech.php: (no message) (duration: 00m 03s)
14:35 logmsgbot: andrew Synchronized wmf-config/wikitech.php: (no message) (duration: 00m 02s)
13:39 akosiaris: disabled puppet on uranium. Testing ganglia with SSDs
12:10 akosiaris: restarted gmetad on nickel, it was not responding on port 8654
05:04 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Oct 24 05:04:33 UTC 2014 (duration 4m 32s)
03:22 logmsgbot: LocalisationUpdate completed (1.25wmf5) at 2014-10-24 03:22:36+00:00
02:46 logmsgbot: LocalisationUpdate completed (1.25wmf4) at 2014-10-24 02:46:15+00:00

October 23

23:39 logmsgbot: catrope Synchronized php-1.25wmf4/extensions/CentralAuth/: SWAT (duration: 00m 04s)
23:38 logmsgbot: catrope Synchronized php-1.25wmf4/extensions/AntiSpoof/: SWAT (duration: 00m 06s)
23:29 logmsgbot: catrope Synchronized php-1.25wmf5/extensions/TimedMediaHandler/: SWAT (duration: 00m 04s)
23:29 logmsgbot: catrope Synchronized php-1.25wmf5/extensions/UploadWizard/: SWAT (duration: 00m 06s)
23:28 logmsgbot: catrope Synchronized php-1.25wmf4/extensions/UploadWizard/: SWAT (duration: 00m 04s)
23:28 logmsgbot: catrope Synchronized php-1.25wmf4/extensions/TimedMediaHandler/: SWAT (duration: 00m 04s)
23:27 logmsgbot: catrope Synchronized php-1.25wmf4/includes/api/ApiFormatBase.php: SWAT (duration: 00m 04s)
22:42 hashar: Jenkins is all good now.
22:36 hashar: Jenkins restarting
22:28 hashar: preparing jenkins for restart
21:54 hashar: Jenkins the Gearman plugin is holding a lock on deployment-bastion slave that prevents it from running any job :-/
21:51 ejegg: updated civicrm from ad3386cd0f9b776e2fded7c4e6b1195e05ed669c to 937df4dacae0dd620ae9e8fed13566d51c1b18a4
21:46 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1018 (duration: 00m 06s)
21:43 hashar: Jenkins: disabling / reenabling Gearman plugin
21:35 hashar: Jenkins: disconnected / reconnected slave node deployment-bastion.eqiad
21:15 awight: updated crm from 03b15f7dad58ce61894d632e8fbebd2ae76ae4d0 to ad3386cd0f9b776e2fded7c4e6b1195e05ed669c
21:04 ejegg: updated civicrm from 0a3ab0f18ce726898d14adcbe6ab08411c9e3e82 to 03b15f7dad58ce61894d632e8fbebd2ae76ae4d0
19:29 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.25wmf5
19:26 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias to 1.25wmf4
19:19 logmsgbot: reedy Finished scap: testwiki to 1.25wmf5 (duration: 32m 55s)
18:46 logmsgbot: reedy Started scap: testwiki to 1.25wmf5
18:34 awight: updated crm from d6a75b6df4482de61da372fa653902db7ca12766 to 0a3ab0f18ce726898d14adcbe6ab08411c9e3e82
18:19 logmsgbot: tgr Synchronized wmf-config/InitialiseSettings.php: Disable ImageMetrics on non-public wikis (duration: 00m 05s)
18:03 logmsgbot: tgr Synchronized wmf-config/InitialiseSettings.php: Enable ImageMetrics on all wikis (duration: 00m 05s)
17:14 logmsgbot: tgr Synchronized wmf-config/InitialiseSettings.php: Enable ImageMetrics on group0 (duration: 00m 05s)
17:08 logmsgbot: tgr Finished scap: Deploying ImageMetrics extension (duration: 32m 04s)
16:54 paravoid: cr2-ulsfo: upgrading junos again
16:36 logmsgbot: tgr Started scap: Deploying ImageMetrics extension
15:46 paravoid: preparing to upgrade JunOS on cr2-ulsfo
15:31 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: db1065 to normal load (duration: 00m 08s)
15:19 logmsgbot: anomie Synchronized php-1.25wmf4/api.php: SWAT: Include ApiMain construction in api.php try-catch block gerrit:168296 (duration: 00m 09s)
15:19 logmsgbot: anomie Synchronized php-1.25wmf4/includes/api/ApiMain.php: SWAT: Include ApiMain construction in api.php try-catch block gerrit:168128 (duration: 00m 09s)
15:11 logmsgbot: anomie Synchronized php-1.25wmf4/includes/api/ApiFormatFeedWrapper.php: SWAT: Fix ApiFormatFeedWrapper gerrit:168128 (duration: 00m 09s)
14:25 ottomata: varnishkafka request.required.acks is now 2 for text, mobile, and bits.
12:46 hashar: killed left over java/jenkins process on gallium
12:09 Krinkle: Zuul/Jenkins stuck. Tried various gearman/zuul resets. Restarting Jenkins now.
07:25 _joe_: restarted hhvm on mw1114, depooled the server
05:39 Tim: on mw1189 testing some URLs at a high rate, attempting to induce measurable memory leak
05:06 Tim: reverted unexplained uncomitted modification of palladium:/srv/pybal-config/pybal/eqiad/api which repooled mw1189
03:41 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Oct 23 03:41:49 UTC 2014 (duration 41m 48s)
03:06 logmsgbot: cscott Synchronized wmf-config/filebackend.php: fix using a file from commons with file name length between 140 and 159 (duration: 00m 20s)
03:01 Krinkle: git-deploy: Deploying integration/slave-scripts 157ef23
02:31 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1065, warm up (duration: 00m 06s)
02:27 logmsgbot: LocalisationUpdate completed (1.25wmf4) at 2014-10-23 02:27:48+00:00
02:15 logmsgbot: LocalisationUpdate completed (1.25wmf3) at 2014-10-23 02:15:22+00:00
02:10 springle: removed old /var/log/ocg.log* on ocg1002, forced a logrotate
02:06 springle: upgrade reboot db1065

October 22

23:25 logmsgbot: demon Synchronized wmf-config/CommonSettings-labs.php: (no message) (duration: 00m 04s)
23:25 logmsgbot: demon Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 04s)
23:22 logmsgbot: demon Synchronized php-1.25wmf4/extensions/Collection: (no message) (duration: 00m 04s)
23:21 logmsgbot: demon Synchronized php-1.25wmf3/extensions/Collection: (no message) (duration: 00m 04s)
23:08 mutante: added jhernandez to wmf LDAP group
22:54 bd808: forced puppet run on logstash1003 to pick up https://gerrit.wikimedia.org/r/#/c/168199/
22:47 bd808: forced puppet run on logstash1002 to pick up https://gerrit.wikimedia.org/r/#/c/168199/
22:42 bd808: forced puppet run on logstash1001 to pick up https://gerrit.wikimedia.org/r/#/c/168199/
22:21 bblack: depooled amssq42 (esams text) for trusty testing
21:38 bd808: killed duplicate logstash services running on logstash1001 and restarted
21:16 arlolra: updated OCG to version e977e2c8ecacea2b4dee837933cc2ffdc6b214cb
21:06 bd808: forced puppet run on logstash1001 to pick up https://gerrit.wikimedia.org/r/#/c/168182/
20:40 bd808: forced puppet run on logstash1001 to pick up https://gerrit.wikimedia.org/r/168089
20:37 mutante: powercycling unresponsive ms-be1012 (this happened before, search SAL for hostname)
20:25 arlolra: updated Parsoid to version 2a8dc85ce676391acd8c6255c4f94250612c9ee2
16:59 andrewbogott: reinstalling OS on virt1006
15:57 logmsgbot: marktraceur Synchronized wmf-config/InitialiseSettings.php: [SWAT] Send collection logs to logstash. (duration: 00m 05s)
15:31 ottomata: setting request.required.acks to 2 for mobile and text varnishkafka's (mobile was set to 2 yesterday)
15:23 _joe_: rolling restart of hhvm appservers, to alleviate memory issues
15:09 logmsgbot: marktraceur Synchronized wmf-config/InitialiseSettings.php: [SWAT] Pre-render thumbnails on upload on Commons (duration: 00m 05s)
15:05 logmsgbot: marktraceur Synchronized wmf-config/CommonSettings-labs.php: [SWAT] Re-enable PediaPress POD in production. (duration: 00m 05s)
15:05 logmsgbot: marktraceur Synchronized wmf-config/CommonSettings.php: [SWAT] Re-enable PediaPress POD in production. (duration: 00m 05s)
13:38 godog: catch-up swiftrepl sync eqiad -> codfw for commons containers
09:41 hashar: Zuul/Jenkins in a deadlock
09:08 hashar: Restarting Jenkins
09:07 hashar: Jenkins: upgrading gearman-plugin from 0.0.7-1-g3811bb8 to 0.1.0-1-gfa5f083 . Ie bring us to latest version + 1 commit
03:54 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Oct 22 03:54:53 UTC 2014 (duration 54m 52s)
02:30 logmsgbot: LocalisationUpdate completed (1.25wmf4) at 2014-10-22 02:30:07+00:00
02:17 logmsgbot: LocalisationUpdate completed (1.25wmf3) at 2014-10-22 02:17:49+00:00

October 21

23:21 logmsgbot: maxsem Synchronized php-1.25wmf4/includes/PrefixSearch.php: https://gerrit.wikimedia.org/r/#/c/167982/ (duration: 00m 03s)
23:09 logmsgbot: maxsem Synchronized php-1.25wmf3/extensions/MobileFrontend/: SWAT (duration: 00m 04s)
23:08 logmsgbot: maxsem Synchronized php-1.25wmf4/extensions/MobileFrontend/: SWAT (duration: 00m 04s)
23:07 logmsgbot: maxsem Synchronized php-1.25wmf4/extensions/CentralAuth/: SWAT (duration: 00m 04s)
22:47 mutante: radium - installed OS, signing puppet cert requests, initial run ...
22:42 legoktm: manually finished global rename for BonumTV --> Karypal which failed due to page move timeout
22:13 andrewbogott: deleted unused labs projects: versionview, feeds, datadog, fundraising-awight, simplewiki, mediawiki-custom-de, fundraising, sartoris, wikibits, incubator, wikiversity-sandbox, data4all
21:33 K4-7131: adjusting payments antifraud filters
21:29 logmsgbot: ebernhardson Synchronized php-1.25wmf4/extensions/LiquidThreads/api/ApiQueryLQTThreads.php: Bump LQT in 1.25wmf4 (duration: 00m 04s)
20:41 cscott: updated OCG to version 523c8123cd826c75240837c42aff6301032d8ff1
20:40 logmsgbot: hoo Synchronized wmf-config/CommonSettings.php: Set extendwatchlist = 0 (duration: 00m 08s)
20:37 hashar: lanthanum /var/lib/jenkins-slave/tmpfs went full again. cleared up a bunch of files
20:22 _joe_: installing new hhvm packages on the depooled server mw1189, for debugging
20:14 AaronSchulz: Deployed deployment/jobrunner to d426235e10edc682b532e7b4f2b02bb9414661ba
19:47 logmsgbot: hoo Synchronized wmf-config/InitialiseSettings.php: Add import sources for orwikisource (duration: 00m 08s)
19:12 qchris: Added its-phabricator plugin (d425a5ded909ee73df53d5e6d91d28014d0be375) into gerrit
18:40 chasemp: puppet on gerrit which restarted service
18:25 logmsgbot: reedy Synchronized php-1.25wmf3/extensions/Collection/: (no message) (duration: 00m 13s)
18:24 logmsgbot: reedy Synchronized php-1.25wmf4/extensions/Collection/: (no message) (duration: 00m 14s)
18:21 logmsgbot: reedy Synchronized database lists: (no message) (duration: 00m 26s)
18:20 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 18s)
18:07 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non wikipedias to 1.25wmf4
16:54 _joe_: stopping puppet on mw1114 in order to do some jemalloc debugging
16:40 K4-713: adjusted payments fraud filters for WP test
14:45 godog: start catch up swiftrepl on ms-fe1003 for 'notcommons' containers
14:28 ottomata: set vm.dirty_writeback_centisecs = 200 (was 500) on analytics1021
14:23 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1065 (duration: 00m 06s)
11:56 godog: silenced *-lb.ulsfo.wikimedia.org
11:43 godog: drained ulsfo via DNS, GTT link problems
11:09 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1066, warm up (duration: 00m 06s)
09:09 akosiaris: enabled icinga-wm again
08:50 akosiaris: temporarily killed icinga-wm
04:21 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Oct 21 04:21:19 UTC 2014 (duration 21m 18s)
03:06 logmsgbot: LocalisationUpdate completed (1.25wmf4) at 2014-10-21 03:06:47+00:00
02:33 logmsgbot: LocalisationUpdate completed (1.25wmf3) at 2014-10-21 02:33:35+00:00
02:26 mutante: rebooting iron for upgrades
02:12 mutante: restarting Apache on strontium
01:58 mutante: restarting apache on puppetmaster, temp. stopping icinga-wm
01:31 K4-713: adjusted fraud filters on payments
00:32 logmsgbot: ori Synchronized wmf-config: Id01fe7aac: Turn off spammy message cache log (duration: 00m 05s)

October 20

23:56 awight: updated payments to 4cf8eb06a4746478c6424648c94688bf460cf63d
23:31 springle: upgrade db1004 trusty and reboot
23:20 mutante: installing package upgrades on iron
23:03 logmsgbot: demon Synchronized wmf-config/CommonSettings-labs.php: no-op, for completeness (duration: 00m 05s)
22:38 awight|purgreen: updating payments config with French Snowflake hack
22:37 Tim: doing load testing on mw1189
22:28 Tim: enabled puppet on mw1189
21:08 cscott: redis-cli srem "deploy:ocg/ocg:minions" tantalum.eqiad.wmnet
21:08 bblack: baham running gdnsd-2.1.0 test pkg
21:08 bd808: Deployed iegreview 203d509 (Disable strict variables for twig)
21:01 cscott: updated OCG to version ea10c93aca9bc1cae34f284fd74bb05d4b6a8cc6
20:33 jgage: ulsfo repooled in dns
20:17 jgage: ulsfo fpc 1 mic 1 card swap complete
20:11 jgage: beginning router card swap in ulsfo
20:11 subbu: deployed Parsoid version d4567e9f
20:01 paravoid: cr1-ulsfo: reenable ospf/ospf3 (GTT is stable)
19:13 mutante: importing schema, data, users into mysql for iegreview
17:41 apergos: all trusty and lucid hosts now running salt 2014.1.11 (this includes labcontrol2001, salt master for future codfw labs)
16:48 logmsgbot: reedy Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 14s)
16:47 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: orwikisource
16:44 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 15s)
16:44 logmsgbot: reedy Synchronized database lists: orwikisource (duration: 00m 13s)
16:43 cmjohnson: reseating pem3 cr2-eqiad
16:43 apergos: all precise hosts salt updated to 2014.1.11, this includes tin (deployment) and virt1000 (salt master for labs). Not updated: virt1006 (inaccessible)
15:58 logmsgbot: anomie Synchronized wmf-config: SWAT: Enable wgSecurePollUseNamespace for testwiki gerrit:167592 (duration: 00m 10s)
15:57 logmsgbot: anomie Finished scap: (no message) (duration: 18m 22s)
15:39 logmsgbot: anomie Started scap: (no message)
15:38 logmsgbot: anomie Synchronized php-1.25wmf4/extensions/SecurePoll/: SWAT: Update SecurePoll for testing on testwiki gerrit:167586 (duration: 00m 10s)
15:20 _joe_: disabling puppet on mw1189 to do some hhvm testing
15:20 logmsgbot: anomie Synchronized php-1.25wmf3/resources/lib/oojs-ui/: SWAT: OOJS-UI bug fixes gerrit:167344 (duration: 00m 12s)
15:19 logmsgbot: anomie Synchronized php-1.25wmf3/extensions/VisualEditor/: SWAT: VE bug fixes gerrit:167344 (duration: 00m 10s)
15:08 logmsgbot: anomie Synchronized php-1.25wmf4/extensions/VisualEditor/: SWAT: VE bug fixes gerrit:167577 (duration: 00m 10s)
14:26 paravoid: cr1-ulsfo: deactivating ospf/ospf3 on GTT ulsfo-eqiad link
13:26 paravoid: cr2-ulsfo: "request chassis mic {off,on}line fpc-slot 1 mic-slot 1" to reboot broken card
12:28 apergos: upgraded salt master (plus minion) on palladium to 2014.1.11, all neww precise installs will get this version now, other minion upgrades to follow shortly
11:51 godog: temporarily stopped ircecho/icinga-wm on neon, shower of alarms
11:42 godog: killed stray/old copy of diamond that was filling up conntrack on virt1000
09:53 akosiaris: restarted ocg on ocg1001, ocg1002, ocg1003
07:07 _joe_: rolling restart of ocg services
04:29 springle: removed old /var/log/ocg* on ocg1001 and ocg1003 and forced logrotate, / space critical
03:42 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Oct 20 03:42:24 UTC 2014 (duration 42m 23s)
02:50 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1042 (duration: 00m 06s)
02:34 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1066 (duration: 00m 06s)
02:28 logmsgbot: LocalisationUpdate completed (1.25wmf4) at 2014-10-20 02:28:35+00:00
02:16 logmsgbot: LocalisationUpdate completed (1.25wmf3) at 2014-10-20 02:16:31+00:00

October 19

03:36 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Oct 19 03:36:03 UTC 2014 (duration 36m 2s)
02:25 logmsgbot: LocalisationUpdate completed (1.25wmf4) at 2014-10-19 02:25:44+00:00
02:14 logmsgbot: LocalisationUpdate completed (1.25wmf3) at 2014-10-19 02:14:23+00:00

October 18

15:09 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1061, warm up (duration: 00m 06s)
13:49 _joe_: restarted apache on the puppetmasters
03:51 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Oct 18 03:51:18 UTC 2014 (duration 51m 17s)
02:33 logmsgbot: LocalisationUpdate completed (1.25wmf4) at 2014-10-18 02:33:01+00:00
02:20 logmsgbot: LocalisationUpdate completed (1.25wmf3) at 2014-10-18 02:20:20+00:00

October 17

23:46 bd808: Ran trebuchet to create initial tag for iegreview/iegreview
23:35 ori: pooled mw1114 (hhvm api server) to test whether new package resolves overload behavior
23:28 ori: updating hhvm app servers to 3.3.0-20140925+wmf3
22:18 mutante: graceful Apache on antimony - svn fixed, gitblit behind varnish
22:09 mutante: graceful Apache on neon - icinga and tendril done, ishmael = misc-web
21:46 mutante: graceful Apache on stat1001
21:25 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool es1004, es1007, es1010 (duration: 00m 07s)
21:19 mutante: graceful Apache on netmon1001
21:13 mutante: graceful Apache on puppetmaster
20:34 andrewbogott: removed some stray .zip files from /tmp on ocg1002
20:16 ori: disabled puppet on osmium to debug hhvm
15:36 paravoid: killed tampa config remnants on all cr1/cr2s
15:30 bblack: restarted puppetmasters
14:44 _joe_: load test on hhvm done
14:26 _joe_: load test on the hhvm cluster
10:41 akosiaris: uploaded apertium 3.3 on apt.wikimedia.org (trusty-wikimedia)
09:17 _joe_: manually killed long-running stuck processes on ocg1001, moving to the rest of the cluster
08:56 _joe_: restarted the ocg cluster
08:44 akosiaris: uploaded cg3 and lttoolbox on apt.wikimedia.org
05:37 _joe_: depooling both hhvm api appservers
04:52 springle: xtrabackup clone es1010 to es2008
04:51 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Oct 17 04:51:02 UTC 2014 (duration 51m 1s)
04:17 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: reduce load on db1059 (duration: 00m 21s)
04:05 springle: upgrade es1010 to trusty (clone failed, needs trusty)
03:54 springle: xtrabackup clone es1010 to es2008
03:53 springle: xtrabackup clone es1007 to es2006
03:37 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool es1007 and es1010 (duration: 00m 09s)
03:13 logmsgbot: LocalisationUpdate completed (1.25wmf4) at 2014-10-17 03:13:33+00:00
02:39 logmsgbot: LocalisationUpdate completed (1.25wmf3) at 2014-10-17 02:39:26+00:00
01:16 mutante: powering up server formerly known as cp1001
00:47 logmsgbot: hoo Synchronized php-1.25wmf4/extensions/Wikidata/: Fix ORMTable usage, IE 11 freeze bug and adopt to further core changes (duration: 00m 14s)

October 16

23:07 mutante: RT - removed global permission for privileged users to create tickets - should not affect anyone because users are either not privileged or get this from other groups - need it to be flexible about readonly queues in RT - let me know if any issues
22:15 K4-713: undo last payments settings change
22:09 K4-713: payments settings updates
20:15 logmsgbot: reedy Synchronized database lists: (no message) (duration: 00m 18s)
20:15 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 17s)
20:10 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.24wmf4
20:07 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikipedias to 1.25wmf3
20:06 logmsgbot: reedy Synchronized database lists: (no message) (duration: 00m 19s)
20:06 logmsgbot: reedy Finished scap: testwiki to 1.25wmf4 (duration: 46m 21s)
19:36 K4-713: payments localsettings updates - supported countries and fraud filter settings
19:19 logmsgbot: reedy Started scap: testwiki to 1.25wmf4
19:09 K4-713: updated payments to 14c415fcfc3cade9a1
18:22 ottomata: restarted varnishkafka on cp3021
17:47 logmsgbot: ejegg Synchronized php-1.25wmf2/extensions/CentralNotice/: Update CentralNotice (duration: 00m 04s)
17:29 logmsgbot: ejegg Synchronized php-1.25wmf3/extensions/CentralNotice/: Update CentralNotice (duration: 00m 08s)
17:29 bblack: depooled mw1114 for api
15:31 logmsgbot: marktraceur Synchronized wmf-config/Wikibase.php: [SWAT] Define client CSS classes for new wikidata badges (duration: 00m 05s)
15:30 marktraceur: Sorry, that was in fact adding NS_PROPERTY to the search configuration, mistyped.
15:29 logmsgbot: marktraceur Synchronized wmf-config/InitialiseSettings.php: [SWAT] Enable wgCopyUploadsFromSpecialUpload on testwiki (duration: 00m 05s)
15:26 logmsgbot: marktraceur Synchronized php-1.25wmf3/extensions/Wikidata/: [SWAT] [wmf3] Update CSS for Wikidata badges (duration: 00m 11s)
15:25 _joe_: repooling mw1189
15:25 logmsgbot: marktraceur Synchronized php-1.25wmf2/extensions/Wikidata/: [SWAT] [wmf2] Update CSS for Wikidata badges (duration: 00m 11s)
15:11 logmsgbot: marktraceur Synchronized wmf-config/InitialiseSettings.php: [SWAT] Enable wgCopyUploadsFromSpecialUpload on testwiki, Add commons to wgImportSources for sewikimedia (duration: 00m 05s)
15:05 logmsgbot: marktraceur Synchronized wmf-config/InitialiseSettings.php: Re-enable prerendering of thumbnails for new files. (duration: 00m 05s)
15:02 andrewbogott: removed 'publicKey' and 'accessKey' from ldap user records -- they were obsolete and making everyone nervous
14:08 _joe_: depooling mw1189 from the api pool, reimaging with hhvm
10:11 hashar: Updated our Jenkins Job Builder forked repository ( ee80dbc..7ad4386 ). No job configuration impact.
09:52 paravoid: rebooting ms-be1003, sdn3/xfs troubles
09:30 paravoid: powercycling tmh1002, unresponsive, stuck, no vga output
08:44 godog: powercycle ms-be1007, no ssh and no console
07:53 hashar: Jenkins: upgrading PHPUnit from 3.7.28 to 3.7.37 164683 wikitech-l announce
04:44 springle: xtrabackup clone es1004 to es2001
04:24 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Oct 16 04:24:21 UTC 2014 (duration 24m 20s)
02:46 logmsgbot: LocalisationUpdate completed (1.25wmf3) at 2014-10-16 02:46:46+00:00
02:25 logmsgbot: LocalisationUpdate completed (1.25wmf2) at 2014-10-16 02:24:55+00:00
01:31 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1061 (duration: 00m 07s)
00:49 Krinkle: Zuul queue made unstuck by fixing the clogged build (see bug 72113)
00:14 Krinkle: Jenkins queue is stuck (99% free executors, but it's not running any of Zuul's pending jobs)

October 15

23:10 logmsgbot: maxsem Synchronized php-1.25wmf2/extensions/MobileFrontend/: (no message) (duration: 00m 04s)
23:04 logmsgbot: maxsem Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/166886/ (duration: 00m 05s)
21:37 ori: restarting hhvm on mw1114, this time with luasandbox
21:20 hoo: Gracefulled apache on mw1115
21:06 ejegg: updated crm from 05e5388df34059c651223d53fb2986ac1c39a2d9 to d6a75b6df4482de61da372fa653902db7ca12766
21:02 legoktm_: running mwscript extensions/CentralAuth/maintenance/migrateAccount.php on terbium for broken accounts (bug 61876)
20:48 mutante: deleting/shredding ishmael cert/keys from neon
20:41 hoo: Deleted 147 orphan wb_terms entries (bug 71914)
20:40 ori: restarting apache on mw1115 to test luasandbox
20:25 logmsgbot: yurik Synchronized wmf-config/mobile.php: Disable font for ZeroBanner (duration: 01m 05s)
20:17 hoo: Deleted ten orphan wb_entity_per_page rows on wikidata
20:17 logmsgbot: yurik Synchronized php-1.25wmf3/extensions/ZeroPortal: updatidng ZeroPortal to master (duration: 01m 11s)
20:13 arlolra: restarted ocg service
20:02 logmsgbot: yurik Synchronized php-1.25wmf2/extensions/ZeroPortal: updatidng ZeroPortal to master (duration: 01m 15s)
19:47 logmsgbot: yurik Synchronized php-1.25wmf3/extensions/ZeroBanner/: Latest ZeroBanner (duration: 01m 11s)
19:37 logmsgbot: yurik Synchronized php-1.25wmf2/extensions/ZeroBanner/: Latest ZeroBanner (duration: 01m 07s)
19:05 ottomata: deployed webstatscollector 0.4 on oxygen (filter) and gadolinium (collector)
18:46 ori: adjusting pybal weight for mw1114 back up to 20 to confirm that leak is in luasandbox
17:57 ori_: installed lua5.1 on mw1114 so i can switch scribunto to luastandalone and thus potentially isolate the leak to luasandbox
15:38 andrewbogott: running sync-common on virt1000
15:36 logmsgbot: marktraceur Synchronized php-1.25wmf3/extensions/OpenStackManager/: [SWAT] [wmf3] Make list=novainstances available to anons (duration: 00m 06s)
15:36 logmsgbot: marktraceur Synchronized php-1.25wmf2/extensions/OpenStackManager/: [SWAT] [wmf2] Make list=novainstances available to anons (duration: 00m 05s)
15:22 paravoid: AMS-IX renumbering: remove old IP from interface, migration over; > 75% of total peers migrated, accounting for much more bandwidth/routes
15:13 logmsgbot: marktraceur Synchronized php-1.25wmf3/extensions/VisualEditor/modules/ve-mw/ui: [SWAT] [wmf3] Update OOjs UI to v0.1.0-pre (d74a46ca6a) and VisualEditor-MediaWiki to Ie06056b (duration: 00m 05s)
15:12 logmsgbot: marktraceur Synchronized php-1.25wmf3/resources/lib/oojs-ui: [SWAT] [wmf3] Update OOjs UI to v0.1.0-pre (d74a46ca6a) and VisualEditor-MediaWiki to Ie06056b (duration: 00m 06s)
14:32 paravoid: AMS-IX renumbering: move all remaining ASNs to the new space
14:20 Coren: Not reimaging mw1035 after all; hhvm is in our base, killing our ramz.
14:16 paravoid: AMS-IX renumbering: peering with (renumbered) top-10 ASNs + ASNs with large number of prefixes
14:12 Coren: reimaging mw1035 for great justice!!! (HHVM)
14:11 _joe_: powercycling mw1205, down since this morning, console blank
13:29 Jeff_Green: dist-upgrade and reboot indium
12:40 hashar: disabled/reenabled gearman plugin at https://integration.wikimedia.org/ci/manage
12:34 hashar: Zuul frozen \O/
11:38 _joe_: depooling mw1114 again
10:59 paravoid: AMS-IX renumbering: adding second IP, peering with RS1
10:47 _joe_: repooled mw1114 with reduced load, using jemalloc with prof_leak enabled for sampling. will depool again soon
10:27 _joe_: depooling mw1114, stopping puppet for debugging purposes
09:24 godog: enable container sync for commons containers
08:09 hashar: restarting Jenkins
08:06 hashar: Jenkins: upgrading Gearman plugin to Patchset 9 of https://review.openstack.org/#/c/125755/
07:52 springle: ongoing schema changes rev_content_(model|format) multiple shards, ok to kill osc_host.sh jobs on terbium in emergency
06:58 _joe_: restarting hhvm on mw1114 to avoid memory exhaustion
05:54 K4-713: updated payments to 944596744e0de23fee098
03:34 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Oct 15 03:34:50 UTC 2014 (duration 34m 49s)
02:28 logmsgbot: LocalisationUpdate completed (1.25wmf3) at 2014-10-15 02:28:00+00:00
02:15 logmsgbot: LocalisationUpdate completed (1.25wmf2) at 2014-10-15 02:15:31+00:00

October 14

23:42 logmsgbot: catrope Synchronized php-1.25wmf3/extensions/Collection: SWAT (duration: 00m 08s)
23:42 logmsgbot: catrope Synchronized php-1.25wmf2/extensions/Collection: SWAT (duration: 00m 04s)
22:24 hoo: Gracefulled apache on mw1075
19:16 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 17s)
19:07 logmsgbot: reedy Synchronized images/: (no message) (duration: 00m 14s)
18:52 Reedy: Purged php-1.24wmf18 from mediawiki-appservers
18:41 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non wikipedias to 1.25wmf3
18:20 logmsgbot: reedy Purged l10n cache for 1.25wmf1
18:15 godog: stop diamond on virt1000 and zirconium to test
17:52 godog: conntrack full on virt1000 and zirconium, suspected diamond collector runaway
16:33 _joe_: repooling mw1114
15:27 _joe_: reimaging mw1114 with HHVM - first server in the API pool; depooling and reinstalling now.
15:22 logmsgbot: marktraceur Synchronized php-1.25wmf3/resources/lib/oojs-ui/: [SWAT] [wmf3] OOjs UI: New pull-through to 837b2f733e to fix a missed dependency (duration: 00m 06s)
15:12 logmsgbot: marktraceur Synchronized php-1.25wmf3/includes/api/ApiQueryBacklinks.php: [SWAT] [wmf3] API: Fix ApiQueryBacklinks redirlinks (duration: 00m 05s)
15:11 logmsgbot: marktraceur Synchronized php-1.25wmf2/includes/api/ApiQueryBacklinks.php: [SWAT] [wmf2] API: Fix ApiQueryBacklinks redirlinks (duration: 00m 06s)
13:56 godog: enable container sync on non-commons sharded containers
12:27 hashar: Jenkins restarting
12:25 hashar: Jenkins: upgrading Gearman plugin to fix jobs registrations ( cherry picked https://review.openstack.org/#/c/125755/ and compiled it via maven ).
10:38 godog: enable container sync on non-sharded originals containers
09:46 godog: upload python-elasticsearch to trusty-wikimedia
09:44 _joe_: running sync-common on mw1163
03:34 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Oct 14 03:34:53 UTC 2014 (duration 34m 52s)
02:27 logmsgbot: LocalisationUpdate completed (1.25wmf3) at 2014-10-14 02:27:23+00:00
02:15 logmsgbot: LocalisationUpdate completed (1.25wmf2) at 2014-10-14 02:15:19+00:00
01:42 springle: restarted gitblit

October 13

22:17 akosiaris: ran update-rc.d -f puppetmaster remove on palladium/strontium
21:47 akosiaris: restarting apache on palladium, strontium
20:20 _joe_: load test done. HHVM is awesome
19:51 _joe_: load test on HHVM starting
19:13 logmsgbot: oblivian Synchronized wmf-config/CommonSettings.php: Serving 5% of anons with HHVM (duration: 00m 12s)
17:12 logmsgbot: oblivian Synchronized wmf-config/CommonSettings.php: Serving 2% of anons with HHVM (duration: 00m 06s)
16:19 _joe_: restarting gitlbit, stuck in GC probably
14:30 _joe_: rolling restart of the ocg cluster
13:10 godog: enable container sync on wikibooks originals as test
11:01 logmsgbot: hoo Synchronized wmf-config/InitialiseSettings.php: Add edh-www.adw.uni-heidelberg.de to the wgCopyUploadsDomains whitelist (duration: 00m 08s)
10:30 godog: enable container sync as a test on wiktionary*-local-public
09:49 godog: test enable container sync on wikibooks-it-local-thumb
08:24 godog: stopping swift on ms-be1013, out of disk space on /
03:33 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Oct 13 03:33:51 UTC 2014 (duration 33m 50s)
02:37 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: s1 api traffic prefer db1065 and db1066 (duration: 00m 07s)
02:28 logmsgbot: LocalisationUpdate completed (1.25wmf3) at 2014-10-13 02:28:12+00:00
02:16 logmsgbot: LocalisationUpdate completed (1.25wmf2) at 2014-10-13 02:16:03+00:00
02:10 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1062, warm up (duration: 00m 07s)
01:58 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: s4 api traffic prefer db1059 (duration: 00m 08s)

October 12

13:21 logmsgbot: reedy Synchronized php-1.25wmf3/extensions/CentralAuth/: (no message) (duration: 00m 18s)
13:19 logmsgbot: reedy Synchronized php-1.25wmf3/includes/templates/: Unbreak user signup (duration: 00m 15s)
03:30 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Oct 12 03:30:17 UTC 2014 (duration 30m 16s)
02:28 logmsgbot: LocalisationUpdate completed (1.25wmf3) at 2014-10-12 02:28:03+00:00
02:16 logmsgbot: LocalisationUpdate completed (1.25wmf2) at 2014-10-12 02:16:12+00:00

October 11

23:39 Reedy: killed both logstash events on logstash100[23]. Started logstash again after
23:33 Reedy: killed both logstash events on logstash1001. Started logstash again after
23:23 Reedy: Started logstash on logstash1001
23:21 bd808: logstash not showing any udp2log events after 2014-10-10T01:42:22.000Z
16:09 logmsgbot: hoo Synchronized php-1.25wmf3/extensions/CentralAuth/: Deploying forgotten backport from Thursday: SpecialCentralAutoLogin: Fix getting files after file layout change (duration: 00m 08s)
06:01 bblack: put ms-fe.svc.codfw.wmnet into downtime for the next two days, because I'm tired of getting paged about it :p
03:35 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Oct 11 03:35:00 UTC 2014 (duration 34m 59s)
02:29 logmsgbot: LocalisationUpdate completed (1.25wmf3) at 2014-10-11 02:29:32+00:00
02:17 logmsgbot: LocalisationUpdate completed (1.25wmf2) at 2014-10-11 02:17:18+00:00

October 10

19:41 Coren: mw1026 rebuild complete (now with HHVM goodness in every bite!)
18:02 Coren: begin reimaging of mw1026
17:32 ejegg: updated civicrm from c684b07805ad75f10796fd4dbb82ece4818a7aa3 to 05e5388df34059c651223d53fb2986ac1c39a2d9
17:17 Coren: mw1027 rebuild complete (now with HHVM goodness in every bite!)
16:18 paravoid: stopping swift on ms-be1013, debugging
14:07 hashar: Disconnecting / reconnecting Jenkins/Zuul gearman as per https://bugzilla.wikimedia.org/show_bug.cgi?id=63760#c12
13:29 logmsgbot: manybubbles Synchronized wmf-config/CirrusSearch-common.php: Add configuration so cirrus can build an index to speed up regex searches (duration: 00m 04s)
11:22 godog: rolling restart of container-server on ms-be1*
11:13 godog: rolling restart of container-server on ms-be2*
07:12 legoktm: running migratePass0 across all wikis
06:49 _joe_: load testing done
06:22 _joe_: doing some load testing on HHVM (api)
04:38 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Oct 10 04:38:38 UTC 2014 (duration 38m 37s)
04:22 ^d: elasticsearch upgrade from 1.3.2 -> 1.3.4 complete for all 18 nodes. Sporadic icinga warnings about health should go away now
04:19 springle: upgrade db1046 mariadb 10
03:44 springle: enable purging of old eventlogging data from specific tables on m2-master, as per analytics@ discussion
03:12 logmsgbot: LocalisationUpdate completed (1.25wmf3) at 2014-10-10 03:12:51+00:00
02:39 logmsgbot: LocalisationUpdate completed (1.25wmf2) at 2014-10-10 02:39:23+00:00
00:53 legoktm: running initSiteStats.php on wikidatawiki
00:47 legoktm: ran updateArticleCount.php --wiki=ckbwiki (bug 71884)

October 9

23:57 logmsgbot: maxsem Synchronized php-1.25wmf2/resources/: (no message) (duration: 00m 03s)
23:57 logmsgbot: maxsem Synchronized php-1.25wmf3/resources/: (no message) (duration: 00m 04s)
23:55 logmsgbot: maxsem Synchronized php-1.25wmf2/extensions/OpenStackManager/: (no message) (duration: 00m 04s)
23:52 logmsgbot: maxsem Synchronized php-1.25wmf3/extensions/MobileApp: (no message) (duration: 00m 03s)
23:50 logmsgbot: maxsem Synchronized php-1.25wmf2/extensions/MobileApp: (no message) (duration: 00m 04s)
23:45 logmsgbot: maxsem Synchronized php-1.25wmf2/extensions/Flow/: (no message) (duration: 00m 09s)
23:43 logmsgbot: maxsem Synchronized php-1.25wmf3/extensions/Wikidata/: (no message) (duration: 00m 10s)
23:40 logmsgbot: maxsem Synchronized php-1.25wmf2/extensions/Wikidata/: (no message) (duration: 00m 10s)
22:24 K4-713: updated payments wiki to 17f822a64742bd13e
20:33 subbu: deployed parsoid version 644071d2
20:03 Jeff_Green: rebooting samarium
19:51 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 15s)
19:42 manybubbles: upgrading elastic1014
19:34 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.25wmf3
19:31 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikipedias to 1.25wmf2
19:21 logmsgbot: reedy Finished scap: testwiki to 1.25wmf3 and build l10n cache (take 2) (duration: 30m 00s)
18:51 logmsgbot: reedy Started scap: testwiki to 1.25wmf3 and build l10n cache (take 2)
18:51 bd808: cherry-picked I3ae9edab2505c37945fe66863721913a6d33223c to scap
18:42 logmsgbot: reedy scap failed: TypeError bufsize must be an integer (duration: 08m 33s)
18:34 logmsgbot: reedy Started scap: testwiki to 1.25wmf3 and build l10n cache
17:56 Coren: begin reimaging of mw1027
17:55 Coren: done reimaging of mw1028. Now hhvm_appserver
16:58 _joe_: gracefully restarted again api apaches to recover 500s
16:43 godog: re-enable puppet on ms-fe/ms-be in eqiad
16:39 godog: re-enable puppet on ms-fe/ms-be in codfw
16:23 logmsgbot: oblivian gracefulled all apaches
15:34 hashar: restarted Zuul
15:31 Coren: begin reimaging of mw1028
15:31 Coren: done reimaging of mw1029. Now hhvm_appserver
15:28 logmsgbot: manybubbles Synchronized php-1.25wmf2/extensions/VisualEditor/: SWAT deploy VE cherry-pick (duration: 00m 06s)
15:26 andrewbogott: upgraded wikitech-static to 1.25wmf2
14:02 akosiaris: updated pybal on palladium for citoid
13:54 Coren: begin reimaging of mw1029
12:01 logmsgbot: reedy Purged l10n cache for 1.24wmf22
11:59 springle: converted some librenms tables to innodb on db1001 m1-master. should be a no-op
11:57 springle: xtrabackup db1016 to db2010
11:39 manybubbles: starting upgrade of elastic1009
11:11 _joe_: reenabled puppet on mw*
11:11 godog: disabled puppet in ms-fe/ms-be in eqiad/codfw to merge container-sync changes
10:35 _joe_: disabling puppet on most mw* hosts while testing apache changes
08:17 _joe_: repooling mw102[3-5],mw1053 in the hhvm pool
07:15 _joe_: reimaging mw102[3-5] to hhvm
07:02 _joe_: reinstalling mw1053
03:33 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Oct 9 03:33:47 UTC 2014 (duration 33m 46s)
02:30 logmsgbot: LocalisationUpdate completed (1.25wmf2) at 2014-10-09 02:30:03+00:00
02:17 logmsgbot: LocalisationUpdate completed (1.25wmf1) at 2014-10-09 02:17:53+00:00

October 8

23:31 mutante: importing xml dump to cawikimedia
23:28 andrewbogott: running sync-common on virt1000
23:27 logmsgbot: demon Synchronized php-1.25wmf2/extensions/OpenStackManager: (no message) (duration: 00m 06s)
23:23 logmsgbot: demon Synchronized php-1.25wmf2/extensions/Flow: (no message) (duration: 00m 05s)
23:14 logmsgbot: demon Synchronized php-1.25wmf2/extensions/CommonsMetadata: (no message) (duration: 00m 06s)
22:40 mutante: virt0 - deleted salt key, revoked puppet cert, removed from site.pp
22:39 Reedy: Removed openjdk-6-* from logstash100[1-3]
22:07 subbu: updated OCG to version def24eca
22:07 mutante: tin - deleted empty pmtpa dsh group files
22:02 mutante: tin - there are dozens of dsh groups that have been removed from repo long time ago but never got purged, but it isn't easy to tell what might still be used, so deleting all and letting puppet recreate might be risky?
20:48 legoktm: currently running /home/legoktm/fixBug71749.php on terbium
19:49 Reedy: logstash upgraded to 1.4.2-1 on logstash100[1-3]
19:46 Reedy: Created flow tables on officewiki
17:04 _joe_: load testing done
16:44 _joe_: doing some load testing on the hhvm servers
16:09 Reedy: elasticsearch upgraded on logstash1001 to 1.3.4
16:07 Reedy: elasticsearch upgraded on logstash100[23] to 1.3.4
16:07 Reedy: elasticsearch upgraded on logstash[
15:08 logmsgbot: anomie Synchronized php-1.25wmf2/extensions/VisualEditor/lib/ve/src/ce/ve.ce.Surface.js: SWAT: Revert "ve.ce.Surface: Magic workaround for broken Firefox cursoring" gerrit:164593 (duration: 00m 09s)
15:04 manybubbles: upgrading elastic1002 now
14:53 _joe_: repooling mw1163
14:20 hasharBusy: disabled puppet on gallium to make sure a zuul config change stick in. 165481
14:19 manybubbles: fixed missing elasticsearch extension jar file and brought elastic1001 back up. git fat betrayed us.
14:14 hasharBusy: hard restarting zuul
14:03 manybubbles: upgrading elastic1001 uncovered a bug in our highlighter that I have yet to diagnose. I removed that server from the rotation so we'll continue to use the old version.
13:49 _joe_: depooling && reimaging mw1163
12:44 manybubbles: upgraded elastic1001 to Elasticsearch 1.3.2 -> 1.3.4, experimental highlighter 0.0.11 -> 0.0.12, and installed trigram accelerated regex search 0.0.1
12:32 manybubbles: deploying new elasticsearch plugins in preparation for minor Elasticsearch version upgrade today
11:02 logmsgbot: reedy Synchronized docroot and w: good riddance to bad docroots (duration: 00m 16s)
09:27 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: isolate api traffic on s2 to db1054 and db1060 (duration: 01m 20s)
09:03 springle: killed masses of sleeping connections on s2 slaves
08:11 paravoid: powercycling rhenium, unresponsive
07:55 springle: restart db2011
04:31 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Oct 8 04:31:03 UTC 2014 (duration 31m 2s)
03:18 logmsgbot: LocalisationUpdate completed (1.25wmf2) at 2014-10-08 03:18:44+00:00
02:40 logmsgbot: LocalisationUpdate completed (1.25wmf1) at 2014-10-08 02:40:48+00:00
02:02 logmsgbot: tstarling Finished scap: (no message) (duration: 09m 01s)
01:53 logmsgbot: tstarling Started scap: (no message)
01:35 logmsgbot: tstarling scap failed: CalledProcessError Command '('/usr/bin/git', 'rev-list', '-1', '@{upstream}')' returned non-zero exit status 128 (duration: 00m 14s)
01:35 logmsgbot: tstarling Started scap: (no message)
01:32 logmsgbot: tstarling scap failed: CalledProcessError Command '('/usr/bin/git', 'rev-list', '-1', '@{upstream}')' returned non-zero exit status 128 (duration: 00m 14s)
01:31 logmsgbot: tstarling Started scap: (no message)
01:16 logmsgbot: tstarling scap failed: CalledProcessError Command '('/usr/bin/git', 'rev-list', '-1', '@{upstream}')' returned non-zero exit status 128 (duration: 00m 25s)
01:16 logmsgbot: tstarling Started scap: update for Wikidata crash bug
00:41 mutante: searchidx1001 - same, fixed duplicate salt-minion
00:40 mutante: osmium - salt-minion was running twice, stopped both, killed one, restarted properly
00:38 mutante: cp3016 - why you report failed puppet unlike everyone else but then it works
00:33 springle: long schema changes running from terbium. ok to kill osc_host.sh in emergency
00:01 logmsgbot: ori Synchronized php-1.25wmf2/extensions/WikimediaEvents: Update WikimediaEvents for If9cdde0f0 (duration: 00m 03s)
00:01 logmsgbot: ori Synchronized php-1.25wmf1/extensions/WikimediaEvents: Update WikimediaEvents for If9cdde0f0 (duration: 00m 04s)

October 7

23:29 andrewbogott: restarting every shutoff VM on virt1005
23:20 logmsgbot: maxsem Synchronized wmf-config/InitialiseSettings.php: SWAT: https://gerrit.wikimedia.org/r/165393 (duration: 00m 04s)
22:54 cscott: updated OCG to version c778ea8b898f8ad8c2b7ad9de78a75469e7ed061
22:50 mutante: db68,tarin - revoke the last remaining pmtpa certs
22:48 logmsgbot: ori Synchronized php-1.25wmf1/extensions/WikimediaEvents: Update WikimediaEvents for Ied71b5032: Groundwork for HHVM productivity analysis (duration: 00m 04s)
22:47 mutante: db60,db69-74,es4,es7,es10 - remove from icinga monitoring, puppet certs, salt keys
22:42 logmsgbot: ori Synchronized php-1.25wmf2/extensions/WikimediaEvents: Update WikimediaEvents for Ied71b5032: Groundwork for HHVM productivity analysis (duration: 00m 04s)
22:40 mutante: fenari - revoked puppet cert, rm salt key, rm from icinga ...
22:37 andrewbogott: cycling power on virt1005 -- unresponsive
21:27 mutante: mchenry - revoke puppet cert, clean storedconfigs/rm from icinga
21:04 mutante: dobson - revoke puppet cert, delete from storedconfigs/icinga, deleted from dsh
20:56 K4-713: altered worldpay account settings for France on payments
20:48 mutante: mexia - revoke salt,puppet,monitoring,storedconfigs
20:27 mutante: pdf2/pdf3 - revoked puppet certs, removed from DNS & icinga
19:42 mutante: temp. stopped icinga-wm
19:41 mutante: restarting apache on palladium - mod_passenger fail
19:29 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 23s)
19:29 logmsgbot: reedy Synchronized database lists: (no message) (duration: 00m 20s)
19:20 Reedy: Created EducationProgram tables on cawiki
19:19 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 26s)
19:09 ^d: cleared old files from runs on gallium tmpfs, testing should recover now.
18:45 csteipp: deployed fix for bug 71749
18:43 mutante: sanger - deleted salt key, revoked puppet cert, rm icinga stored config, already out of DNS - Killing sanger.wikimedia.org...done.
18:42 logmsgbot: csteipp Synchronized php-1.25wmf2/extensions/CentralAuth/CentralAuthPlugin.php: (no message) (duration: 00m 06s)
18:34 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non wikipedias to 1.25wmf2
18:25 ^d: jenkins tmpfs run out of space again, tests failing
16:24 logmsgbot: reedy Synchronized database lists: echo for fawikivoyage (duration: 00m 20s)
16:22 Reedy: Created echo tables on fawikivoyage on extension1 cluster
15:00 logmsgbot: reedy Synchronized docroot and w: (no message) (duration: 00m 14s)
14:31 logmsgbot: reedy Synchronized docroot and w: Fixup noc (duration: 00m 16s)
10:53 godog: restart commons swiftrepl from ms-fe1003 and non-commons from ms-fe1004 to avoid maxing out copper's nic
10:09 godog: start swiftrepl of commons originals eqiad -> codfw
09:56 godog: start swiftrepl of non-commons originals eqiad -> codfw
06:02 logmsgbot: ori Synchronized php-1.25wmf1/includes/objectcache/HashBagOStuff.php: I0b0b5f01: HashBagOStuff: use the value itself as the CAS token (duration: 00m 06s)
06:02 logmsgbot: ori Synchronized php-1.25wmf2/includes/objectcache/HashBagOStuff.php: I0b0b5f01: HashBagOStuff: use the value itself as the CAS token (duration: 00m 07s)
03:28 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Oct 7 03:28:09 UTC 2014 (duration 28m 8s)
02:26 logmsgbot: LocalisationUpdate completed (1.25wmf2) at 2014-10-07 02:26:04+00:00
02:14 logmsgbot: LocalisationUpdate completed (1.25wmf1) at 2014-10-07 02:14:21+00:00
01:16 logmsgbot: ori Synchronized php-1.25wmf1/extensions/Wikidata: Ie92da71 / I44f1dce: Update Wikidata, fixes for serialization issues (duration: 00m 09s)
01:15 logmsgbot: ori Synchronized php-1.25wmf2/extensions/Wikidata: Ie92da71 / I44f1dce: Update Wikidata, fixes for serialization issues (duration: 00m 10s)
00:34 Tim: core dumps were enabled on mw1088, unexpectedly started gathering natural segfault traffic

October 6

23:49 mutante: tarin, nfs-1 - revoked salt key,puppet cert, stored configs
23:06 logmsgbot: maxsem Synchronized php-1.25wmf2/extensions/MobileFrontend/: (no message) (duration: 00m 04s)
23:06 logmsgbot: maxsem Synchronized php-1.25wmf1/extensions/MobileFrontend/: (no message) (duration: 00m 04s)
23:05 logmsgbot: maxsem Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/165099/ https://gerrit.wikimedia.org/r/#/c/164902/ (duration: 00m 04s)
22:28 Tim: on mw1088 debugging crash bug 71542
21:59 hoo: Reverted wd:Q17939676 to 157541810 and edit=sysop
21:47 cscott: updated OCG to version bbdf4c6400cfbbc6030114ad16e1a6f7025eab2c
20:21 awight: backfilled recurring GC glitch from FR #2018, 3342 records affected.
20:16 subbu: deployed parsoid sha 13a53ab3 (deploy repo sha 38d44ada7)
19:03 akosiaris: issued cf disable and halt on nas1-a.pmtpa.wmnet nas1-b.pmtpa.wmnet. They are officially down :)
17:22 _joe_: hhvm load test finished
17:03 _joe_: depooling and repooling progressively hhvm appservers to do see performance under load
16:57 Krinkle: git-deploy: Deploying integration/slave-scripts 0b85d48
16:19 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: Set $wgPercentHHVM to 1 (duration: 00m 27s)
15:54 cscott: updated OCG to version aee3712b352f51f96569de0bcccf3facf654e688
15:45 GroggyPanda: deleted graphite data for deployment-rsync02 by hand on labmon1001, since instance has been dead. Need to move to shinken + dynamic host.cfg
15:37 logmsgbot: manybubbles Synchronized php-1.25wmf1/extensions/Wikidata/: SWAT update wikidata (duration: 00m 10s)
15:23 logmsgbot: manybubbles Synchronized php-1.25wmf2/extensions/Wikidata/: SWAT update wikidata (duration: 00m 10s)
15:22 hashar: Zuul jobs proceeding again
15:22 godog: swiftrepl replicating non-sharded originals containers eqiad -> codfw
15:22 logmsgbot: manybubbles Synchronized wmf-config/: SWAT Add tracking categories for files with attribution problems (duration: 00m 06s)
15:19 cscott: ran 'sudo -u ocg -g ocg nodejs-ocg scripts/run-garbage-collect.js -c /home/cscott/config.js' from /home/cscott/ocg/mw-ocg-service in order to clear caches (working around https://gerrit.wikimedia.org/r/164644 ) on ocg100x.eqiad.wmnet
14:51 cmjohnson1: disconnecting Tampa servers
13:46 godog: starting test swiftrepl run on wikibooks eqiad -> codfw
11:49 _joe_: done restarting ocg servers
11:34 _joe_: rolling restart and cleaning of ocg nodes, trying to unlock pdf generation
11:11 mark: Shutdown tarin
11:11 mark: Shutdown sanger
09:27 _joe_: cleaned ocg another time
09:07 mark: Stopped dovecot on sanger
08:06 _joe_: cleaned ocg1001, again
05:57 Nemo_bis: "book creator seems stuck": PDF servers at 97 % CPU, little traffic, enough disk free for about 1 day more
03:26 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Oct 6 03:26:31 UTC 2014 (duration 26m 30s)
02:59 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1060 (duration: 00m 06s)
02:28 logmsgbot: LocalisationUpdate completed (1.25wmf2) at 2014-10-06 02:28:42+00:00
02:17 logmsgbot: LocalisationUpdate completed (1.25wmf1) at 2014-10-06 02:17:40+00:00

October 5

22:28 Coren: Q183 superprotected as a safeguard
22:27 hoo: Q183 is on revision 116786096 again, please don't alter this further!
22:21 qchris: Updated gerrit's hooks-bugzilla to 6e1e659 (with hooks-its at a421db4)
22:11 hoo: WD:Q183 was frozen on version 120566337, see bug 71519 (and others)
21:23 hoo: Bypassed Wikibase restrictions and set https://www.wikidata.org/wiki/Q183 back to old serialization format
20:08 Nemo_bis: 22.03 < Ainali> It was just noticed on svwp village pump that http://stats.wikimedia.org is down
16:39 paravoid: restore ns1 routing to codfw
11:23 paravoid: adding static route for ns1 to rubidium (ns0) on cr1-eqiad to temporarily redirect its traffic while the codfw is offline
03:19 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Oct 5 03:19:55 UTC 2014 (duration 19m 53s)
03:02 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: I707b5754: Enable LuaSandbox profiling when is true (duration: 00m 07s)
02:22 logmsgbot: LocalisationUpdate completed (1.25wmf2) at 2014-10-05 02:22:47+00:00
02:13 logmsgbot: LocalisationUpdate completed (1.25wmf1) at 2014-10-05 02:13:26+00:00

October 4

21:08 _joe_: cleaning ocg1001 tmpfs from a 32 gb pdf file
19:59 jgage: restarted pdns on virt1000 for ldap config update
07:08 springle: powercycle es1004
03:27 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Oct 4 03:27:20 UTC 2014 (duration 27m 19s)
02:25 logmsgbot: LocalisationUpdate completed (1.25wmf2) at 2014-10-04 02:25:04+00:00
02:15 logmsgbot: LocalisationUpdate completed (1.25wmf1) at 2014-10-04 02:15:02+00:00
01:01 bblack: depooling cp1045 for persistent cache wipe
00:01 andrewbogott: updated the defaut labs precise image: updated ldap setup, new /var/log partition

October 3

22:46 bd808: Restarting zuul on gallium
22:40 bd808: Trying a soft restart of zuul on gallium
22:37 bd808: NoConnectedServersError("No connected Gearman servers") in zuul.log on gallium
22:33 bd808|deploy: Updated integration/phpunit to 6c1d11d (Regenerate autoloader)
22:31 subbu: restarted Parsoid servers after another gradual cpu load creep
22:19 logmsgbot: aaron Synchronized wmf-config/InitialiseSettings.php: Fixed the parser cache type for labswiki (duration: 00m 03s)
21:55 andrewbogott: updated the defaut labs trusty image: updated packages, updated ldap setup, new /var/log partition
20:37 ori: disabling puppet on rbf1002 to test bloom filter config
20:31 logmsgbot: manybubbles Synchronized wmf-config/CirrusSearch-labs.php: noop update to sync beta configs (duration: 00m 04s)
20:30 cscott: (the above was on ocg100x.eqiad.wmnet)
20:30 cscott: ran 'sudo -u ocg -g ocg nodejs-ocg scripts/run-garbage-collect.js -c /home/cscott/config.js' from /home/cscott/ocg/mw-ocg-service in order to clear caches (working around https://gerrit.wikimedia.org/r/164644 )
20:23 andrewbogott: restarted zuul
20:22 logmsgbot: ori Synchronized wmf-config/mc.php: Ie1ed821a7: Set bloom cache config (duration: 00m 03s)
19:34 ori: when running puppet merge: fatal: Unable to create '/var/lib/git/operations/puppet/.git/refs/remotes/origin/production.lock': File exists.
19:28 hashar: Restarting Zuul sorry :-/
19:26 hashar: Zuul in some kind of death loop
19:05 mutante: purging old 'searchqa' scripts and logs from iron (gerrit 164429 removes from puppet)
18:46 mutante: restored dbtree from manual backup (should have been synced by scripts)
18:25 logmsgbot: aaron Synchronized php-1.25wmf2/extensions/CentralAuth: (no message) (duration: 00m 05s)
18:14 logmsgbot: maxsem Synchronized php-1.25wmf2/extensions/MobileFrontend/: https://gerrit.wikimedia.org/r/#/c/164543/ (duration: 00m 04s)
18:14 logmsgbot: maxsem Synchronized php-1.25wmf1/extensions/MobileFrontend/: https://gerrit.wikimedia.org/r/#/c/164543/ (duration: 00m 04s)
17:53 bblack: restarted parsoid varnishes
17:30 andrewbogott: enabling puppet on tungsten which is disabled for mysterious reasons
17:04 gwicke: restarting parsoids after CPU spike
16:50 ejegg: updated crm from 6f66294607d132230ef82fb4867c37a8700bfd4e to ef68d9fe98a64e819ebbdddbe5e13f83037607ce
15:34 _joe_: purging varnish cache for parsoid (RT 8528)
15:19 mark: shutdown pdf2 & pdf3
15:15 bblack: adding 10.2.1.0/24 aggregate in cr-[12].codfw
15:14 bblack: dropping 10.2.1.0/24 aggregate + static routes in cr2-pmtpa
14:37 bblack: testing dns server upgrade on baham
14:35 logmsgbot: manybubbles Synchronized wmf-config/CirrusSearch-labs.php: Noop - just keeps beta config in sync (duration: 00m 04s)
14:18 Jeff_Green: launched iodine:/opt/otrs/bin/otrs.RebuildFulltextIndex.pl per bugzilla #64473
13:07 Reedy: Updated minor_mime to varbinary(100) on image|filearchive|oldimage on foundationwiki
12:30 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool es1004 (duration: 00m 06s)
11:51 logmsgbot: reedy Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 15s)
11:49 logmsgbot: aude Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 10s)
11:37 springle: shutdown db60 db68 db69 db71 db72 db73 db74 es4 es7 es10
10:43 mark: Shutdown amaranth.toolserver.org's switchport on asw-d-pmtpa
09:35 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1063 (duration: 00m 07s)
08:31 akosiaris: deleting snmp community from nas1-a, nas1-b. I guess librenms is going to start complaining
08:20 akosiaris: unexporting, offline, destroying /vol/home_pmtpa on nas1-a
06:59 _joe_: depooling mw1022, then reimaging it
04:22 springle: upgrade db1063 mariadb 10
04:16 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Oct 3 04:16:35 UTC 2014 (duration 16m 34s)
04:04 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1063 (duration: 00m 08s)
03:17 logmsgbot: LocalisationUpdate completed (1.25wmf2) at 2014-10-03 03:17:28+00:00
02:44 logmsgbot: LocalisationUpdate completed (1.25wmf1) at 2014-10-03 02:44:39+00:00

October 2

23:15 logmsgbot: maxsem Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/164483 (duration: 00m 03s)
23:06 logmsgbot: maxsem Synchronized php-1.25wmf2/extensions/MobileFrontend/: (no message) (duration: 00m 04s)
22:53 logmsgbot: reedy Synchronized wmf-config/: Experimentally enable vips for larger (>50MP) tiff files (duration: 00m 15s)
22:38 mutante: icinga_broken_due_to_missing_hostgroup_counter incremented
21:43 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: Disabling prerendering of images from this mornings swat (duration: 00m 04s)
21:33 bblack: added LVS BGP config setup to cr[12]-codfw
20:46 logmsgbot: ori Synchronized php-1.24wmf22/extensions/NavigationTiming: Update NavigationTiming for cherry-picks (duration: 00m 03s)
20:44 logmsgbot: ori Synchronized php-1.25wmf1/extensions/NavigationTiming: Update NavigationTiming for cherry-picks (duration: 00m 04s)
20:11 bblack: stopping -> starting uwsgi/apache -type stuff on tungsten
19:55 aude: populated sites table for fawikivoyage
19:27 bd808: hosts that failed Trebuchet update of scap: virt0.wikimedia.org, fenari.wikimedia.org, mw1110.eqiad.wmnet, mw1053.eqiad.wmnet. mw1053.eqiad.wmnet only failed checkout
19:23 bd808: Trebuchet reports for scap sync "231/234 minions completed fetch; 230/234 minions completed checkout" Some stale entries need to be removed from Trebuchet redis cache
19:21 bd808: Updated scap to eff0d01 (Fix format specifier for error message)
19:12 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: fawikivoyage (duration: 00m 16s)
19:12 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: fawikivoyage
19:11 logmsgbot: reedy Synchronized database lists: fawikivoyage (duration: 00m 16s)
18:47 mutante: graceful'ed apache on mw1030,mw1164
18:44 logmsgbot: reedy Purged l10n cache for 1.24wmf21
18:43 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.25wmf2
18:41 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias to 1.25wmf2
18:39 logmsgbot: reedy Finished scap: testwiki to 1.25wmf2 (duration: 24m 19s)
18:32 cmjohnson1: replacing failed disk es1005
18:15 logmsgbot: reedy Started scap: testwiki to 1.25wmf2
17:57 manybubbles: going to try to restart lsearchd on the misc pool machines to see if that makes it responsive
17:42 ejegg: updated payments-wiki from 83464deed3b66da655ca5d1086852237c4793b71 to 9417bbd95057a87824be157dbbb5965a1f09d202
17:26 logmsgbot: aaron Synchronized php-1.25wmf1/maintenance/findMissingFiles.php: aa2eb3c0de08256822a2b0c985ebb3a6145d28cd (duration: 00m 05s)
16:30 ori: graceful'd all apaches for I98bcdbfc7: mediawiki: add vhost_combined log format to apache2.conf
15:32 logmsgbot: demon Synchronized php-1.25wmf1/extensions/Wikidata: (no message) (duration: 00m 20s)
15:32 logmsgbot: demon Synchronized php-1.25wmf1/includes/jobqueue/jobs/ThumbnailRenderJob.php: (no message) (duration: 00m 05s)
15:21 godog: shutting down nfs1
15:15 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 06s)
15:11 logmsgbot: demon Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 04s)
15:11 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 04s)
15:09 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 04s)
15:08 mark: Replaced exim4-deamon-light by exim4-daemon-heavy on tools-mail
15:06 logmsgbot: demon Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 04s)
15:05 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 05s)
13:30 hashar: Zuul back around :]
13:23 hashar: Zuul deadlocked somehow again :(
12:32 godog: rolling-restart swift on ms-be1*, saw increased load possibly as a cause of 5xx spike
12:07 godog: restarting rsyslog in eqiad
12:03 akosiaris: restarted apparmor throughout the fleet
11:58 hashar: Migrated all mediawiki-core-regression* jobs to Zuul cloner bug 71549
10:57 godog: restarted rsyslog in codfw
10:57 godog: restarted rsyslog in ulsfo
10:54 godog: restarted rsyslog in esams
09:28 godog: start rolling depooling/restart/pooling of swift frontends in eqiad to pick up syslog change
09:11 _joe_: removing /mnt/tmpfs/fd29e937fea41d186175bcb880ef96980825dd1c.rdf2latex from ocg1001, it contains a 32 gb pdf
09:08 _joe_: restarting node-ocg on ocg1001; a _lot_ of deleted files with the FD still opened
08:10 hashar: Jenkins has been upgraded to latest LTS version 1.565.3
07:33 hashar: Jenkins restarting
07:27 hashar: Stopping Jenkins
06:58 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool es1010 (duration: 00m 07s)
06:53 springle: upgrade es1010 mariadb 10, restart
06:36 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool es1010 (duration: 00m 07s)
06:07 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool es1008, renable writes to external storage cluster 25 (duration: 00m 06s)
06:00 ori: mw1053 still flooding error logs with "Unrecognized job type 'EchoNotificationDeleteJob'." Disabling Puppet and jobrunner for now, planning to investigate during SF daytime hours.
05:59 springle: upgrade es1008 mariadb 10, restart
05:21 MaxSem: Manual sync-common on mw1053
05:19 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool es1008 (duration: 00m 08s)
05:04 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: divert writes away from external storage cluster 25 (duration: 00m 10s)
04:40 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: upgraded db1073, repool, warm up (duration: 00m 07s)
04:22 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1073 (duration: 00m 13s)
04:08 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Oct 2 04:08:17 UTC 2014 (duration 8m 16s)
02:47 logmsgbot: LocalisationUpdate completed (1.25wmf1) at 2014-10-02 02:47:20+00:00
02:24 logmsgbot: LocalisationUpdate completed (1.24wmf22) at 2014-10-02 02:24:08+00:00

October 1

23:26 logmsgbot: demon Synchronized php-1.25wmf1/extensions/GuidedTour: (no message) (duration: 00m 04s)
23:25 logmsgbot: demon Synchronized php-1.24wmf22/extensions/GuidedTour: (no message) (duration: 00m 04s)
23:19 logmsgbot: demon Synchronized php-1.25wmf1/extensions/VisualEditor: (no message) (duration: 00m 04s)
23:09 Krinkle: Jenkins restart finished. Patched git-client-plugin seems to work as expected (bug 71533).
22:49 Krinkle: Deploy Jenkins git-client-plugin v1.4.6+wmf1 from https://github.com/wikimedia/git-client-plugin/commits/git-client-1.4.6+wmf1 (c80b05bb10985ab94c4c4217d07a0868087b5994) – https://wikitech.wikimedia.org/wiki/Nova_Resource:Integration/Jenkins_Plugin
21:28 logmsgbot: aaron Synchronized php-1.25wmf1/maintenance/findMissingFiles.php: 832ed2ce9938dc51fdb4190423ce03e93e65c639 (duration: 00m 05s)
21:04 mutante: fenari - shutdown -h now (omg) :)
20:32 hoo: Ran sync-common on mw1053 to stop "Unrecognized job type 'EchoNotificationDeleteJob'." exceptions
20:27 cscott: updated OCG to version 48c495e3656f528abe636ce0cd7562270505534f
19:40 logmsgbot: yurik Synchronized php-1.25wmf1/extensions/ZeroBanner/: (no message) (duration: 01m 05s)
19:38 logmsgbot: yurik Synchronized php-1.24wmf22/extensions/ZeroBanner/: (no message) (duration: 01m 03s)
19:17 mutante: fenari - removed from dsh - rejoice deployers, should be faster now
19:03 logmsgbot: yurik Synchronized php-1.24wmf22/extensions/ZeroBanner/: (no message) (duration: 01m 08s)
18:59 logmsgbot: yurik Synchronized php-1.24wmf22/extensions/ZeroBanner/: (no message) (duration: 01m 09s)
18:56 logmsgbot: yurik Synchronized php-1.24wmf22/extensions/ZeroBanner/: (no message) (duration: 01m 04s)
18:34 Krinkle: (..jenkins) The command runs fine when done in that workspace from shell. Looks like a bug with Jenkins Java abstraction layer.
18:28 ori: disabling puppet on mw1019 to test impact of ProxyBadHeader apache directive
18:25 Krinkle: Jenkins jobs for repos with git submodules broken ("git-submodule: git reset: not found")
17:41 andrewbogott: graceful'd apache on logstash1001 logstash1002 logstash1003
16:56 cmjohnson1: swapping disk db1020
16:32 godog: reverted change to syslog.eqiad.wmnet, back to nfs-home.pmtpa.wmnet
15:40 JetLaggedPanda: purged graphite logs for deployment-mediawiki04 by hand on labmon1001 to prevent it from causing issues on icinga, since the instance has been deleted previously
15:32 ottomata: starting upgrade of stat1002 from precise to trusty
15:30 hashar: Jenkins added jgit as a git provider under https://integration.wikimedia.org/ci/configure
15:30 logmsgbot: anomie Synchronized wmf-config/InitialiseSettings.php: SWAT: Rename wikibase debug log gerrit:164061 (duration: 00m 12s)
15:27 logmsgbot: anomie Synchronized php-1.25wmf1/extensions/Wikidata/: SWAT: Fix js error that breaks editing properties on Wikidata gerrit:164079 (duration: 00m 16s)
15:12 logmsgbot: anomie Synchronized wmf-config: SWAT: Disable the old mwlib PDF render service gerrit:163609 (duration: 00m 09s)
15:05 andrewbogott: switched icinga over to the new ldap servers. Seems to still work so far...
15:02 godog: switched syslog to lithium
15:01 logmsgbot: anomie Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: Enable thumbnail prerendering at upload time on Beta gerrit:163836 (duration: 00m 09s)
14:59 hashar: Jenkins changed git executable path from 'git' to '/usr/bin/git'
14:41 godog: testing syslog change on mw1060
14:40 mark: Shutdown mchenry
14:32 andrewbogott: turning virt0 off again. Soon we won't have a choice about this, trying to flush out issues in the meantime.
14:27 bblack: mexia powered off
14:08 mark: Stopped pdns_recursor on mchenry
14:02 mark: Shutting down dobson
13:49 godog: temporarily override syslog.eqiad.wmnet on mw1053 for testing
13:28 mark: Stopped DNS recursor on dobson
13:21 mark: Stopped OpenDJ on sanger
11:58 godog: reboot lithium for installation
10:38 _joe_: re-enabled puppet on mw1018, repooling in a few
10:16 paravoid: killed cp4006's stale puppet agent_disabled.lock, ran puppet
10:16 akosiaris: started spare disk zeroing process on nas1-a
10:16 akosiaris: destroyed backups aggregate on nas1-a
09:58 akosiaris: destroyed baculasd1, baculasd2 and fr_archive volumes on nas1
09:47 akosiaris: umount /home on fenari. fenari user homes no longer available
09:42 akosiaris: touch /etc/nologin on fenari. Non root logins disallowed
09:04 akosiaris: breaking the snapmirror relationships between nas1-a, nas1001-a. Effect: no more fr_archive syncing, fenari /home no longer is synced
08:35 _joe_: disabling puppet on mw1018, enabling debug logging to get more details about fcgi reported errors
03:51 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Oct 1 03:51:23 UTC 2014 (duration 51m 22s)
02:42 mutante: jenkins config used virt0, login was needed though to change the config. blocked Krinkle
02:38 mutante: bringing virt0 back up did indeed fix login on jenkins , also analytics-kafka appears to be still using it
02:36 logmsgbot: LocalisationUpdate completed (1.25wmf1) at 2014-10-01 02:36:43+00:00
02:31 mutante: virt0 - powering back up, suspecting it broke jenkins login
02:31 ori: mw1053 flooding exception logs with: "Unrecognized job type 'EchoNotificationDeleteJob'." Disabling jobrunner & Puppet
02:25 logmsgbot: LocalisationUpdate completed (1.24wmf22) at 2014-10-01 02:25:11+00:00