Server admin log/Archive 25

September 30

23:29 logmsgbot: spage Synchronized wmf-config/InitialiseSettings.php: Create log group for Echo (duration: 00m 11s)
23:27 logmsgbot: spage Synchronized php-1.25wmf1/extensions/Echo: Echo no-op (change reverted) (duration: 00m 09s)
22:55 ori: re-enabling puppet on mw1019
22:36 ori: disabling puppet on mw1019 to enable debug logging in apache
22:09 mutante: removing linne from DNS - was already shutdown about 24 hours before
21:57 K4-713: updated prod civicrm to 477a5107a0c93ceac5214
21:44 ori: Spike of bitter irony from Nemo_bis on #wikimedia-operations starting 21:43 UTC
21:33 logmsgbot: ori Synchronized php-1.25wmf1/languages/Language.php: I672c699c (2/2) (duration: 00m 03s)
21:33 logmsgbot: ori Synchronized php-1.25wmf1/includes/specialpage/SpecialPageFactory.php: I672c699c (1/2) (duration: 00m 07s)
21:23 Nemo_bis: widespread reproducible 503 errors on wikidata and elsewhere
20:55 andrewbogott: powering down virt0, just to see what breaks
20:48 andrewbogott: shutting down pdns on virt0
20:48 andrewbogott: shutting down opendj on virt (temporary, a preview of tomorrow)
18:50 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 16s)
18:49 logmsgbot: reedy Synchronized database lists: (no message) (duration: 00m 15s)
18:41 mutante: pc1001-1003 - can't generate tmp files for percona monitoring checks -> puppet fail
18:24 mutante: killing silver from icinga and puppet
18:23 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 16s)
18:21 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non wikipedias to 1.25wmf1
18:05 logmsgbot: ori Synchronized wmf-config/HHVMRequestInit.php: (no message) (duration: 00m 07s)
18:04 K4-713: re-enabled all queue consumers
17:56 ejegg: updated civicrm from e83c999f39e6ae847d9b48e38c8c825fc10d1635 to b6c350f620c8dc1f3410de179c19cbcbdeb62270
17:19 K4-713: disabled qc jobs and TY mail send for pending civi deploy
15:45 hashar: Updating our Jenkins job builder fork 686265a..ee80dbc (no job changed)
15:42 bblack: rebooting mexia
15:33 logmsgbot: demon Synchronized docroot/bits/favicon/wikipedia.ico: Favicons are my favorite icons, especially when they're only 18% of the size of the original (duration: 00m 04s)
15:16 logmsgbot: demon Synchronized php-1.25wmf1/extensions/Wikidata: (no message) (duration: 00m 11s)
15:14 logmsgbot: demon Synchronized php-1.25wmf1/extensions/VisualEditor: (no message) (duration: 00m 08s)
15:12 akosiaris: merging https://gerrit.wikimedia.org/r/#/c/163735/1, changing the LDAP master from sanger to ldap-mirror for inbound mail
15:12 andrewbogott: running sync-common on virt1000
15:12 logmsgbot: demon Synchronized visualeditor.dblist: (no message) (duration: 00m 04s)
15:11 logmsgbot: demon Synchronized visualeditor-default.dblist: (no message) (duration: 00m 04s)
15:06 logmsgbot: demon Synchronized wmf-config/Wikibase.php: (no message) (duration: 00m 04s)
15:06 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 06s)
14:26 _joe_: restarted apache on mw1196, lots of apc errors
14:22 logmsgbot: oblivian gracefulled all apaches
12:10 mark: Stopped exim daemon on mchenry
09:41 godog: removed obsolete /etc/puppet/hiera from strontium and palladium, /etc/puppet/hieradata is the new location
09:24 godog: reboot ms-be2001 as a test
04:18 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Sep 30 04:18:48 UTC 2014 (duration 18m 47s)
03:17 logmsgbot: LocalisationUpdate completed (1.25wmf1) at 2014-09-30 03:17:15+00:00
02:41 logmsgbot: ori Synchronized 503.html: Ia88b306ef: Make the 503 error page consistent with other 5xx error pages (duration: 00m 08s)
02:34 logmsgbot: LocalisationUpdate completed (1.24wmf22) at 2014-09-30 02:34:07+00:00
01:00 Krinkle: Jenkins connection seemed in order with integration-slave1007 and 8, but disconnecting and relaunching the slave agents immediately resulted in them getting jobs assigned. Cause unknown, problem resolved for now.
00:58 Krinkle: integration-slave1007 and integration-slave1008 have not gotten any jobs in the past 24h. integration-slave1006 however has gotten loads of action. Investigating load balancing issue.
00:24 mutante: linne - shutting down, revoking puppet cert, salt key, puppet/icinga ...
00:12 logmsgbot: maxsem Synchronized w/skins-1.5: (no message) (duration: 00m 03s)
00:12 MaxSem: https://gerrit.wikimedia.org/r/#/c/162520/ broke stuff, reverted
00:10 logmsgbot: maxsem Synchronized live-1.5: (no message) (duration: 00m 03s)

September 29

23:59 logmsgbot: maxsem Synchronized w: https://gerrit.wikimedia.org/r/#/c/162520/ (duration: 00m 03s)
23:58 logmsgbot: maxsem Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/163773/ (duration: 00m 03s)
23:50 logmsgbot: maxsem Synchronized php-1.25wmf1/extensions/MultimediaViewer/: second try... (duration: 00m 04s)
23:42 logmsgbot: maxsem Finished scap: SWATting a bunch of stuff (duration: 18m 44s)
23:32 andrewbogott: stopped apache, nova-scheduler, keystone, puppetmaster on virt0
23:31 bd808: /var/lib/jenkins-slave/tmpfs 100% full on lanthanum.eqiad.wmnet
23:26 andrewbogott: disabling puppet on virt0 so I can kill off services one by one...
23:23 logmsgbot: maxsem Started scap: SWATting a bunch of stuff
23:17 logmsgbot: maxsem Synchronized docroot/: <mutante> he killed the dolphin (duration: 00m 06s)
23:13 Reedy: dist-upgraded logstash1001 and reboot
22:47 Reedy: dist-upgrade logstash1002 and reboot
22:36 Reedy: dist-upgrade on logstash1003 and rebooting
22:34 Reedy: restarted elasticsearch on logstash1003 post java upgrades
22:30 Reedy: packages upgraded on logstash1002
22:28 mutante: silver - shutting down, wait with wiping it for a few days, just incase
22:28 Reedy: packages upgraded on logstash1001
22:24 Reedy: elasticsearch upgradeed to 1.3.2 on logstash1003
22:18 andrewbogott: renaming labs-ns1 to labs-ns0 and labs-ns2 to labs-ns1
22:02 Reedy: elasticsearch upgradeed to 1.3.2 on logstash1002
22:01 mutante: silver - revoke puppet cert, salt-key, stopping services, disable monitoring
21:58 mutante: stopping udp2log-vumi on silver - not needed anymore per Yuvipanda
21:12 Reedy: elasticsearch upgradeed to 1.3.2 on logstash1001
20:50 bd808: Ran sync-common on tmh1002.eqiad.wmnet for cscott's failed sync-dir there
20:49 bd808: Ran sync-common on tmh1001.eqiad.wmnet for cscott's failed sync-dir there
20:29 logmsgbot: cscott Synchronized wmf-config: Switch default PDF renderer to OCG (duration: 00m 15s)
20:04 subbu: deployed Parsoid version deed30b2
19:41 ottomata: restarted varnishkafka on cp3019 to troubleshoot drerrs
19:26 Reedy: doing rolling upgrade of elasticsearch on logstash100[1-3]
17:59 cscott: updated OCG to version 89d8f29a24295b05d0643abe976fea83b56575c9
17:58 logmsgbot: ori Synchronized php-1.24wmf22/includes/password/Pbkdf2Password.php: I3b0a1de69: Test for string in Pbkdf2Password::crypt() (duration: 00m 05s)
17:47 bblack: stopped powerdns and disabled puppet on virt1000 to prevent further cache pollution w/ bad data in public caches
15:57 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: SWAT enable collection extension svwikiversity (duration: 00m 06s)
15:53 hashar: Zuul jobs reregistered
15:46 hashar: Zuul lost all Jenkins jobs :(
15:24 logmsgbot: manybubbles Synchronized php-1.24wmf22/extensions/UploadWizard/: SWAT update UploadWizard (duration: 00m 05s)
15:17 logmsgbot: manybubbles Synchronized php-1.25wmf1/extensions/Wikidata/: SWAT update wikidata to fix hhvm issues. (duration: 00m 14s)
15:05 logmsgbot: manybubbles Synchronized wmf-config/wikitech.php: SWAT sync wikitech file - is a noop I believe (duration: 00m 05s)
15:02 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: SWAT fix config type of flow. (duration: 00m 06s)
13:32 hashar: Restarted Zuul
13:27 hashar: Zuul: tweaking configuration files 162584
09:31 godog: deployed new swift ring to eqiad-prod
08:21 hashar: Restarting Jenkins to have a plugin installed/loaded properly
03:25 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Sep 29 03:25:11 UTC 2014 (duration 25m 10s)
02:26 logmsgbot: LocalisationUpdate completed (1.25wmf1) at 2014-09-29 02:26:50+00:00
02:15 logmsgbot: LocalisationUpdate completed (1.24wmf22) at 2014-09-29 02:15:34+00:00
02:14 bblack: restarting squid on carbon (webproxy)

September 28

23:27 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: switch db1042 load groups to db1056 (duration: 00m 06s)
23:17 springle: powercycle db1042
23:15 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1042, locked up (duration: 00m 07s)
23:12 bblack: restarted apache on mw1123 + mw1196
23:11 bblack: test
23:11 bblack: restarted apache on mw1123 + mw1196
20:28 ori: Puppet failures appear to be caused by apt-get timeouts
10:09 _joe_: updated bash (again) across the whole cluster
03:24 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Sep 28 03:24:20 UTC 2014 (duration 24m 19s)
02:28 logmsgbot: LocalisationUpdate completed (1.25wmf1) at 2014-09-28 02:28:14+00:00
02:17 logmsgbot: LocalisationUpdate completed (1.24wmf22) at 2014-09-28 02:17:01+00:00

September 27

18:35 logmsgbot: ori Synchronized php-1.25wmf1/extensions/Scribunto/engines/LuaSandbox/Engine.php: Capture traces for bug 71045 (duration: 00m 13s)
18:35 logmsgbot: ori Synchronized php-1.24wmf22/extensions/Scribunto/engines/LuaSandbox/Engine.php: Capture traces for bug 71045 (duration: 00m 17s)
04:03 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Sep 27 04:03:25 UTC 2014 (duration 3m 24s)
02:46 logmsgbot: LocalisationUpdate completed (1.25wmf1) at 2014-09-27 02:46:53+00:00
02:25 logmsgbot: LocalisationUpdate completed (1.24wmf22) at 2014-09-27 02:25:05+00:00

September 26

23:26 mutante: switched noc.wikimedia.org to terbium, behind misc-web
22:01 bd808: sudo apache2ctl graceful on logstash100[123] for ldap revert
22:00 bd808: running puppet on logstash100[123] to revert ldap change
21:56 bd808: sudo apache2ctl graceful on logstash100[123] for ldap change
21:35 andrewbogott: gracefulled apache on neon
21:21 mutante: graceful'ed apache on neon
20:45 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: Disable HHVM beta-feature on wikidatawiki (duration: 00m 06s)
19:58 awight: update CRM from 25159fcfc29921b08de86f12121fb292139be09d to 3e42bac8cb7f58f5e504946f4944c69ca5553e60
19:42 mutante: removing root's public_html from fenari - backup kept just in case
19:15 AaronS: Deployed security patches to CentralAuth
19:09 Krinkle: git-deploy: Deploying integration/slave-scripts 08147c42ea42e1a5eca1d29
19:08 logmsgbot: aaron Synchronized php-1.25wmf1/extensions/CentralAuth: (no message) (duration: 00m 07s)
19:06 logmsgbot: aaron Synchronized php-1.24wmf22/extensions/CentralAuth: (no message) (duration: 00m 08s)
18:15 Nemo_bis: untruncated: andrewbogott> ldap is broken for gerrit, should be working elsewhere
18:14 legoktm: ldap is broken
18:09 K4-713: re-enabled donations queue consumer
17:50 awight: CRM queue consumer disabled
17:43 andrewbogott: upgraded libgnutls26 on ytterbium
17:35 andrewbogott: "git reset --hard origin" to remove that terrible hotfix on palladium and strontium.
17:28 awight: CRM jobs reenabled
17:22 hoo: Manually ran rebuildEntityPerPage.php for Wikidata
17:16 andrewbogott: hotfixing /var/lib/git/operations/puppet in hopes of fixing gerrit so I don't have to hotfix no more
17:08 awight: updated crm from 06c9546f9b68f6ecbaaf510944418aa52f9ed0fb to 25159fcfc29921b08de86f12121fb292139be09d
17:02 awight: disabling CRM jobs for deployment...
15:29 andrewbogott: puppet is now moving all labs instances to new ldap servers: ldap-eqiad and ldap-codfw
15:02 cscott: documented what I'm going to clear the OCG queues at https://wikitech.wikimedia.org/wiki/OCG#Pruning_the_queue
14:36 bblack: address for ns1 switched in our local dns data - https://gerrit.wikimedia.org/r/163164
13:57 hoo: Manually declared the global rename Secretary-> VlsergeyBot done after it twice timed out on pages moves on ruwiki
13:39 akosiaris: moved mathoid to low-traffic lvs servers@eqiad
12:48 cscott: cleared OCG caches again when I woke up to buy me more time to investigate the issue properly.
08:44 awight: rollback: revision for civicrm locked to 06c9546f9b68f6ecbaaf510944418aa52f9ed0fb
08:30 _joe_: updated hhvm on mw1053, kicked the jr a couple of times, working again now
08:29 awight: large_donation schema migration 7000
08:28 awight: skip over wmf_civicrm schema migration 7022 -- *why* did I make that unsafe
08:24 awight: fundraising_code_update: revision for civicrm changed from 06c9546f9b68f6ecbaaf510944418aa52f9ed0fb to 5aca00fd4573f0fe8f385baa7238172f6ae54438
08:19 awight: disabling CRM jobs during deployment
08:09 cscott: cleared OCG queues and cache to quiet icinga; will try to get to the root cause tomorrow.
07:41 hashar: Updated our Jenkins Job Builder fork 2d74b16..686265a
07:06 logmsgbot: ori Synchronized php-1.24wmf22/extensions/WikimediaEvents: Update WikimediaEvents for 791e14cfc1d (duration: 00m 05s)
06:53 logmsgbot: ori Synchronized php-1.25wmf1/extensions/WikimediaEvents: Update WikimediaEvents for 0e087daea5 (duration: 00m 07s)
06:41 cscott: updated OCG to version f3a6c1cbba118d4a5e1aa019937dc50159fc823d
04:43 _joe_: updating bash, USN-2363
04:10 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Sep 26 04:10:12 UTC 2014 (duration 10m 11s)
03:09 logmsgbot: LocalisationUpdate completed (1.25wmf1) at 2014-09-26 03:09:47+00:00
02:36 logmsgbot: LocalisationUpdate completed (1.24wmf22) at 2014-09-26 02:36:45+00:00
00:14 awight: turning off Civi jobs before deployment

September 25

23:31 logmsgbot: maxsem Synchronized php-1.25wmf1/skins/Vector/: https://gerrit.wikimedia.org/r/#/c/163021/ (duration: 00m 03s)
23:15 logmsgbot: maxsem Synchronized php-1.25wmf1/extensions/CentralAuth/: https://gerrit.wikimedia.org/r/#/c/162971/ (duration: 00m 04s)
23:12 logmsgbot: maxsem Synchronized php-1.25wmf1/includes/resourceloader/ResourceLoaderSiteModule.php: https://gerrit.wikimedia.org/r/#/c/163024/ (duration: 00m 03s)
23:10 logmsgbot: maxsem Synchronized php-1.25wmf1/includes/api/ApiQueryAllUsers.php: https://gerrit.wikimedia.org/r/#/c/163027/ (duration: 00m 03s)
23:08 logmsgbot: maxsem Synchronized php-1.24wmf22/includes/api/ApiQueryAllUsers.php: https://gerrit.wikimedia.org/r/#/c/163026/ (duration: 00m 03s)
23:02 logmsgbot: maxsem Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/163048 (duration: 00m 03s)
22:58 logmsgbot: ori Synchronized php-1.24wmf22/extensions/Wikidata: Update Wikidata for I0acd2096d21b (duration: 00m 11s)
21:41 mutante: powercycling mw1053
20:36 mutante: no !log
20:36 legoktm: manually migrated "NickK" to a global account
20:29 mutante: repooled mw1051
19:49 bd808: Restarted logstash on logstash1001. udp2log events were not being recorded.
19:30 logmsgbot: reedy Synchronized php-1.25wmf1/: (no message) (duration: 00m 46s)
19:24 logmsgbot: reedy Synchronized php-1.24wmf22/resources/src/mediawiki.ui/components/buttons.less: (no message) (duration: 00m 14s)
19:22 bblack: ntp work done on hosts
19:18 logmsgbot: reedy Synchronized php-1.25wmf1/: (no message) (duration: 00m 55s)
19:17 logmsgbot: reedy Synchronized php-1.24wmf22/extensions/CentralAuth/: (no message) (duration: 00m 14s)
18:55 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 14s)
18:47 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 14s)
18:20 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.25wmf1
18:08 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias to 1.24wmf22
17:20 logmsgbot: reedy Finished scap: testwiki to 1.25wmf1 and build l10n cache (duration: 28m 36s)
16:52 logmsgbot: reedy Started scap: testwiki to 1.25wmf1 and build l10n cache
16:41 Reedy: Purged php-1.24wmf9
16:38 logmsgbot: reedy Purged l10n cache for 1.24wmf20
15:31 bblack: testing ntpd changes on acamar, achernar, chromium, hydrogen, nescio, and baham (puppet-agent disabled)
15:19 logmsgbot: mattflaschen Synchronized wmf-config/CommonSettings.php: Extend GettingStarted bucketting period end date to Sept. 28 (duration: 00m 07s)
12:36 godog: update bash on elastic1014 analytics1021 elastic1013
11:33 _joe_: gracefully reloaded apache on mw1139 and mw1199, apc issues
11:29 logmsgbot: aude Synchronized php-1.24wmf22/extensions/Wikidata/extensions/Wikibase/lib/config/WikibaseLib.default.php: fix apc issues (duration: 00m 06s)
11:03 _joe_: updated bash on elastic1007
10:57 godog: upgraded bash on labsdb1003
10:31 Nemo_bis: SAL is here
09:22 godog: graphite temporarily down, fix incoming
06:13 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1062 (duration: 00m 07s)
03:58 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Sep 25 03:58:02 UTC 2014 (duration 58m 1s)
03:02 logmsgbot: LocalisationUpdate completed (1.24wmf22) at 2014-09-25 03:02:46+00:00
02:32 logmsgbot: LocalisationUpdate completed (1.24wmf21) at 2014-09-25 02:32:56+00:00
02:08 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: Use Debian-packaged texvc on Trusty app servers (duration: 00m 04s)
01:39 ori: gracefuling apaches
00:55 mutante: icinga - manually deleted duplicate host labs-ns1 to fix icinga config and reloads

September 24

23:21 ejegg: Updated paymentswiki from 3ac5dd1c3fade37b6f3a4879aef8ea71b3bbbf08 to 83464deed3b66da655ca5d1086852237c4793b71
23:17 logmsgbot: catrope Synchronized php-1.24wmf22/extensions/VisualEditor: SWAT (duration: 00m 04s)
23:14 logmsgbot: catrope Synchronized php-1.24wmf22/resources/lib/oojs-ui/: SWAT (duration: 00m 05s)
23:12 greg-g: restarted jouncebot, he wasn't announcing deploy windows
23:00 mutante: OCG - scheduled downtime/disabled notifications for LVS check
22:44 andrewbogott: salted a bash update on labs instances, which turned out to be updated already.
22:09 cscott: icinga VS HTTP IPv4 on ocg.svc.eqiad.wmnet test is most likely due to `du -s` of a 6G cache directory, not critical. timeouts can be increased to quiet it. i will look into adding a -quick parameter or some such tomorrow to make the health check faster.
20:56 cscott: updated OCG to version 48acb8a2031863e35fad9960e48af60a3618def9
20:43 logmsgbot: aaron Synchronized php-1.24wmf22/includes/cache/bloom: ad8a7a761d5f3bd086bbd6c88870e83c701e59e3 (duration: 00m 04s)
20:00 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 15s)
19:47 logmsgbot: yurik Synchronized php-1.24wmf22/extensions/ZeroBanner/: Updating to master (duration: 01m 10s)
19:46 logmsgbot: yurik Synchronized php-1.24wmf21/extensions/ZeroBanner/: Updating to master (duration: 01m 07s)
19:14 logmsgbot: yurik Finished scap: updating Graph, JsonConfig, ZeroBanner & ZeroPortal to master for 21 & 22 (duration: 07m 46s)
19:07 logmsgbot: yurik Started scap: updating Graph, JsonConfig, ZeroBanner & ZeroPortal to master for 21 & 22
18:55 logmsgbot: reedy Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 14s)
18:53 logmsgbot: reedy Synchronized php-1.24wmf22/extensions/WikimediaMaintenance: (no message) (duration: 00m 14s)
17:13 manybubbles: lowered throttling on Elasticsearch index transfer from one node to another speed because I hate excitement
15:38 Nemo_bis: cscott> i'm working on the OCG health issue above. i'll let you know when i know what's going on. icinga-wm> PROBLEM - OCG health on ocg1002 is CRITICAL
15:37 logmsgbot: demon Synchronized php-1.24wmf22/extensions/CentralAuth: (no message) (duration: 00m 05s)
15:21 logmsgbot: demon Synchronized php-1.24wmf22/extensions/CirrusSearch/maintenance/updateOneSearchIndexConfig.php: (no message) (duration: 00m 05s)
15:01 logmsgbot: demon Synchronized wmf-config/Wikibase.php: (no message) (duration: 00m 06s)
14:57 Jeff_Green: restarted service ocg on ocg1001
14:40 manybubbles: finished deployment - load spikes look to be gone. yay
14:22 logmsgbot: manybubbles Synchronized php-1.24wmf21/extensions/CirrusSearch/: Switch implementation of Cirrus link counting jobs to hopefully lower overall load. (duration: 00m 04s)
14:21 logmsgbot: manybubbles Synchronized wmf-config: More cirrus config to lower load (duration: 00m 04s)
14:17 logmsgbot: manybubbles Synchronized wmf-config: Cirrus config to lower load (duration: 00m 04s)
14:14 logmsgbot: manybubbles Synchronized php-1.24wmf22/extensions/CirrusSearch/: Switch implementation of Cirrus link counting jobs to hopefully lower overall load. (duration: 00m 06s)
14:08 manybubbles: starting deployment to lower cirrus load spikes
13:19 manybubbles: *disabled*
13:17 manybubbles: disable row awareness on Cirrus's elasticsearch cluster - might help balance load better. too much load was on one row
13:04 hashar: Zuul proceeding queue again
13:00 hashar: Jenkins: disconnecting Gearman client from Zuul and reconnecting
12:59 hashar: Zuul / Jenkins stuck
09:33 hashar_: Jenkins switched mwext-UploadWizard-qunit back to Zuul cloner by applying pending change 161459
09:33 hashar_: restarting zuul-merger
09:32 hashar_: restarting zuul
09:19 hashar_: Upgrading Zuul to f0e3688 Cherry pick https://review.openstack.org/#/c/123437/1 which fix bug 71133 Zuul cloner: fails on extension jobs against a wmf branch
05:41 legoktm: ran script to back populate bug 70620 on metawiki (/home/legoktm/ca/populateBug70620.php on terbium)
04:29 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Sep 24 04:29:53 UTC 2014 (duration 29m 52s)
03:34 logmsgbot: tstarling Finished scap: (no message) (duration: 12m 09s)
03:22 logmsgbot: tstarling Started scap: (no message)
03:21 logmsgbot: tstarling scap failed: RuntimeError scap requires SSH agent forwarding (duration: 00m 00s)
03:12 logmsgbot: LocalisationUpdate completed (1.24wmf22) at 2014-09-24 03:12:54+00:00
02:39 logmsgbot: LocalisationUpdate completed (1.24wmf21) at 2014-09-24 02:39:39+00:00
02:10 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1062 (duration: 00m 06s)
01:25 mutante: tridge - shutting down

September 23

23:47 logmsgbot: maxsem Synchronized php-1.24wmf22/extensions/MobileFrontend/: (no message) (duration: 00m 04s)
23:15 logmsgbot: maxsem Synchronized wmf-config/CommonSettings.php: fail! (duration: 00m 04s)
23:12 logmsgbot: maxsem Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/162297/ (duration: 00m 03s)
23:06 logmsgbot: maxsem Synchronized php-1.24wmf21/extensions/MassMessage/: https://gerrit.wikimedia.org/r/#/c/161002/ (duration: 00m 03s)
22:04 logmsgbot: aaron Synchronized php-1.24wmf22/includes/jobqueue/JobRunner.php: f23f1ad35f02f6a17c9b5842aa6d8c152a273639 (duration: 00m 04s)
21:54 logmsgbot: ebernhardson Finished scap: Bump flow submodule (and change an i18n message) in 1.24wmf21 and 1.24wmf22 (duration: 28m 14s)
21:25 logmsgbot: ebernhardson Started scap: Bump flow submodule (and change an i18n message) in 1.24wmf21 and 1.24wmf22
20:24 cscott: updated OCG to version 1cf9281ec3e01d6cbb27053de9f2423582fcc156
19:38 mutante: stopped etherpad, added repairPad.js, attempted repair of pad 'WRN201409', started etherpad
18:30 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 14s)
18:28 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 16s)
18:14 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non wikipedias to 1.24wmf22
16:59 logmsgbot: aaron Synchronized wmf-config/jobqueue-eqiad.php: Removed redundant config due to new job runner (duration: 00m 05s)
16:29 _joe_: manually created /srv/mediawiki bind mount on searchidx1001; moved old contents to /a/mediawiki-stale, to avoid filling the disk
15:33 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: SWAT remove C and MW namspace aliases from ckbwiki (duration: 00m 07s)
15:24 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: SWAT add *.beeldbank.cultureelerfgoed.nl to upload list (duration: 00m 04s)
15:16 logmsgbot: manybubbles Synchronized php-1.24wmf21/extensions/CirrusSearch/: SWAT update Cirrus for better error handling (duration: 00m 04s)
15:08 logmsgbot: manybubbles Synchronized php-1.24wmf22/extensions/CirrusSearch/: SWAT deploy cirrus backports (duration: 00m 05s)
13:48 akosiaris: change url-downloader ip to point to the new one
13:01 logmsgbot: manybubbles Synchronized wmf-config/: Throttle cirrus jobs some more. (duration: 00m 04s)
12:24 logmsgbot: manybubbles Synchronized wmf-config/: Some new cirrus config (duration: 00m 07s)
09:16 godog: deployed codfw-prod swift ring to palladium
04:49 logmsgbot: tstarling Synchronized php-1.24wmf21/languages/Language.php: profiling (duration: 00m 05s)
04:10 logmsgbot: tstarling Synchronized php-1.24wmf21/languages/Language.php: profiling (duration: 00m 05s)
03:42 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Sep 23 03:42:29 UTC 2014 (duration 42m 28s)
03:29 logmsgbot: tstarling Synchronized wmf-config/CommonSettings.php: fix profiling (duration: 00m 07s)
02:43 logmsgbot: LocalisationUpdate completed (1.24wmf22) at 2014-09-23 02:43:48+00:00
02:30 logmsgbot: LocalisationUpdate completed (1.24wmf21) at 2014-09-23 02:30:38+00:00
02:17 logmsgbot: LocalisationUpdate completed (1.24wmf20) at 2014-09-23 02:17:31+00:00
00:26 mutante: tridge - revoking puppet cert, deleting salt key, decom ...

September 22

23:49 logmsgbot: ebernhardson Synchronized php-1.24wmf22/extensions/LiquidThreads/: Bump LiquidThreads submodule in 1.24wmf22 (duration: 00m 06s)
23:48 logmsgbot: ebernhardson Synchronized php-1.24wmf22/extensions/UploadWizard/: Bump UploadWizard submodule in 1.24wmf22 (duration: 00m 04s)
23:46 logmsgbot: ebernhardson Synchronized php-1.24wmf21/extensions/LiquidThreads/: Bump LQT submodule in 1.24wmf21 (duration: 00m 04s)
23:35 logmsgbot: ebernhardson Synchronized php-1.24wmf21/extensions/UploadWizard/: sync UploadWizard in 1.24wmf21 (duration: 00m 07s)
23:32 logmsgbot: ebernhardson Synchronized php-1.24wmf21/includes/rcfeed/MachineReadableRCFeedFormatter.php: Use safe attribute accessor for RecentChange (duration: 00m 04s)
23:30 logmsgbot: ebernhardson Synchronized php-1.24wmf21/extensions/UploadWizard/: Bump UploadWizard submodule in php-1.24wmf21 (duration: 00m 04s)
23:30 logmsgbot: ebernhardson Synchronized php-1.24wmf21/extensions/Flow/: Bump flow submodule in php-1.24wmf21 (duration: 00m 06s)
23:17 logmsgbot: ebernhardson Synchronized wmf-config/InitialiseSettings.php: Set wgUploadNavigationUrl for eowiki (duration: 00m 05s)
21:04 bd808: production-logstash-eqiad healed by restarting elasticsearch on logstash1002 after OOM + split brain
20:54 bd808: split brain on logstash1002 preceded by by java OOM for elasticsearch
20:52 bd808: logstash1002 went split brain from rest of logstash elastic search cluster. restarting
20:24 subbu: deployed Parsoid ff9476f9
19:31 hashar: Jenkins is broken for extensions patches proposed against the wmf branches bug 71133
18:32 Krinkle: lanthanum tmpfs filled up again, purged manually (bug 71128)
17:22 ori: updated HHVM on beta cluster to HHVM to 3.3.0-20140918+wmf1
17:00 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: Push Cirrus' non-content enwiki shards apart (no-op) (duration: 00m 04s)
15:52 godog: reboot ms-be2001 into PXE to test a re-install
15:07 logmsgbot: anomie Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable Graph extension on mediawiki.org gerrit:161908 (duration: 00m 09s)
15:02 logmsgbot: anomie Synchronized wmf-config/InitialiseSettings.php: SWAT: Add securepoll-create-poll right to sysop on testwiki gerrit:161653 (duration: 00m 09s)
15:01 logmsgbot: anomie Synchronized wmf-config/CommonSettings.php: SWAT: Add REL1_24 as branch in ExtensionDistributor gerrit:161666 (duration: 00m 10s)
14:12 hashar: Jenkins deleted job mediawiki-core-lint , replaced by mediawiki-core-phplint
12:10 apergos: shutdown of db1050 to install trusty
10:04 hashar: Jenkins back and fully operational
09:55 hashar: restarting jenkins
09:37 hashar_: Jenkins: deleting old mediawiki extensions jobs (rm -fR /var/lib/jenkins/jobs/*testextensions-master). They are no more triggered and superseded by the *-testextension jobs.
03:36 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Sep 22 03:36:40 UTC 2014 (duration 36m 39s)
02:41 logmsgbot: LocalisationUpdate completed (1.24wmf22) at 2014-09-22 02:41:29+00:00
02:29 logmsgbot: LocalisationUpdate completed (1.24wmf21) at 2014-09-22 02:29:09+00:00
02:16 logmsgbot: LocalisationUpdate completed (1.24wmf20) at 2014-09-22 02:16:20+00:00

September 21

22:43 ori: ms-be1008 overloaded starting 18:00:24 UTC, syslog says "BUG: soft lockup - CPU#1 stuck for 22s! [kworker/1:1:2196]". machine became unresponsive at 21:35, coinciding with a spike of 5xxs, lasting until Coren powercycled it at 22:10.
03:37 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Sep 21 03:37:31 UTC 2014 (duration 37m 30s)
03:16 springle: labsdb1001 mysqld restarted in gdb; crash loop with a labs user's table
02:46 logmsgbot: ori Synchronized wmf-config/throttle.php: I7bb42b49a: Increase account creation throttle on enwiki for Cochrane colloquium. (duration: 00m 07s)
02:41 logmsgbot: LocalisationUpdate completed (1.24wmf22) at 2014-09-21 02:41:36+00:00
02:29 logmsgbot: LocalisationUpdate completed (1.24wmf21) at 2014-09-21 02:29:51+00:00
02:16 logmsgbot: LocalisationUpdate completed (1.24wmf20) at 2014-09-21 02:16:56+00:00

September 20

22:28 Krinkle: Reloading Zuul to deploy I0170766cfc06b8e6
20:30 andrewbogott: rebooting virt1006 to make good and sure it doesn't spontaneously re-enter the compute pool
20:29 andrewbogott_afk: moved all VMs off of virt1006, disabled compute service
03:46 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Sep 20 03:46:00 UTC 2014 (duration 45m 59s)
02:46 logmsgbot: LocalisationUpdate completed (1.24wmf22) at 2014-09-20 02:46:05+00:00
02:33 logmsgbot: LocalisationUpdate completed (1.24wmf21) at 2014-09-20 02:33:34+00:00
02:19 logmsgbot: LocalisationUpdate completed (1.24wmf20) at 2014-09-20 02:19:34+00:00

September 19

22:16 RoanKattouw: Restarting Jenkins
21:57 logmsgbot: spage Synchronized php-1.24wmf21/extensions/Flow/modules/new/components/flow-board.js: Flow bug 71054 backport (duration: 00m 04s)
20:50 ori: restarted HHVM and cleared bytecode cache on all HHVM app servers
20:47 _joe_: restarted hhvm on mw1018, cleaning the cache as well
20:25 ori: Deployed Ic71064e08 (type hint fix for Wikidata) to wmf21/22.
19:09 bblack: restarted hhvm on mw1021
18:59 _joe_: rolling restart of hhvm servers
18:22 bblack: restarting hhvm on mw1020 (again!)
18:19 hashar: Jenkins: reverting job mwext-VisualEditor-qunit to previous state (i.e. without Zuul cloner)
18:17 bblack: restarting hhvm on mw1020
17:57 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: I3e1bd5e4bb: Don't manipulate the environment to determine TZ offset (Bug: 71036) (duration: 00m 13s)
17:30 bblack: turned down apache prefork procs on fenari to reduce swapping
17:16 ottomata: initiating controlled shutdown of kafka broker analytics1021 to test some kafkatee weirdness, as well as a potential kafka/zookeeper bug
17:07 bblack: restarting apache on fenari
16:21 bblack: restarted hhvm on mw1019 + 1021
14:57 hashar: Jenkins friday deploy: migrate all MediaWiki extension qunit jobs to Zuul cloner.
14:37 akosiaris: initiated rsync of tridge data that is to be kept to nas1001-a
13:56 springle: killing any sleeping connection on enwiki db slaves to make room
13:56 mark: Stopped jobrunners on mw1001-1003
12:36 springle: temporarily disable log fsync on enwiki slaves
12:14 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1072 with ReadAheadNone (duration: 00m 09s)
11:32 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1072. seems more susceptible to replag; find out why. (duration: 00m 10s)
09:14 _joe_: restarted hhvm on mw1053, stuck to 100% cpu since last restart (activating stats)
05:01 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Sep 19 05:01:54 UTC 2014 (duration 1m 52s)
03:45 logmsgbot: LocalisationUpdate completed (1.24wmf22) at 2014-09-19 03:45:33+00:00
03:11 logmsgbot: LocalisationUpdate completed (1.24wmf21) at 2014-09-19 03:11:43+00:00
02:38 logmsgbot: LocalisationUpdate completed (1.24wmf20) at 2014-09-19 02:38:25+00:00
00:43 cscott: updated OCG to version ce16f7adb60d7c77409e2e11ba0e5d6cce6955d5

September 18

23:55 logmsgbot: ori Started scap: Add HHVM as a beta feature
23:54 logmsgbot: ori Synchronized wmf-config/InitialiseSettings.php: I2466f6b6e: Add HHVM to beta feature whitelist (duration: 00m 08s)
23:52 logmsgbot: ori Synchronized php-1.24wmf22/extensions/WikimediaEvents: Update WikimediaEvents for cherry-picks (duration: 00m 06s)
23:51 logmsgbot: ori Synchronized php-1.24wmf21/extensions/WikimediaEvents: Update WikimediaEvents for cherry-picks (duration: 00m 06s)
23:25 logmsgbot: catrope Synchronized php-1.24wmf22/resources/lib/oojs-ui/: oojs-ui bugfixes (duration: 00m 06s)
23:13 logmsgbot: catrope Synchronized php-1.24wmf22/extensions/VisualEditor/: SWAT (duration: 00m 08s)
23:04 logmsgbot: catrope Synchronized php-1.24wmf21/extensions/UploadWizard/: SWAT (duration: 00m 08s)
19:57 Jeff_Green: iridium.wm.o exim conf checked, puppet reenabled
19:54 Jeff_Green: magnesium.wm.o exim conf checked, puppet reenabled
19:50 Jeff_Green: sodium.wm.o exim conf checked, puppet reenabled
19:48 logmsgbot: reedy Synchronized php-1.24wmf22/extensions/Flow/: (no message) (duration: 00m 16s)
19:45 Jeff_Green: iodine.wm.o exim conf checked, puppet reenabled
19:44 Jeff_Green: polonium.wm.o exim conf checked, puppet reenabled
19:35 Jeff_Green: lead.wm.o exim conf checked, puppet reenabled
19:22 logmsgbot: reedy Synchronized php-1.24wmf22: (no message) (duration: 00m 57s)
19:16 Jeff_Green: disabling puppet on polonium, lead, sodium, iridium, magnesium, and iodine to monitor rollout of https://gerrit.wikimedia.org/r/155753
19:05 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: rest of group0 to 1.24wmf22
19:01 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikipedias to 1.24wmf21
18:59 bblack: restarting apache on fenari
18:49 logmsgbot: reedy Finished scap: testwiki to 1.24wmf22 and build l10n cache (duration: 30m 23s)
18:44 Jeff_Green: testing exim configuration change on lead.wm.o
18:18 logmsgbot: reedy Started scap: testwiki to 1.24wmf22 and build l10n cache
17:49 logmsgbot: reedy Started scap: testwiki to 1.24wmf22 and build l10n cache
17:08 cmjohnson1: replacing failed disk es1005
17:05 logmsgbot: yurik Finished scap: (no message) (duration: 23m 26s)
16:43 yurikR: yurik scaping zero - partner needs an l10n message asap
16:42 logmsgbot: yurik Started scap: (no message)
15:38 hashar: restarting Zuul just to be safe
15:06 logmsgbot: anomie Synchronized php-1.24wmf21/resources/src/mediawiki.action/mediawiki.action.view.redirectPage.css: SWAT: mediawiki.action.view.redirectPage: Correct a CSS selector gerrit:161239 (duration: 00m 23s)
15:01 logmsgbot: anomie Synchronized php-1.24wmf21/extensions/Wikidata/: SWAT: Update Wikidata to fix broken xml api output gerrit:161232 (duration: 00m 38s)
11:40 apergos: forgot to log this earlier: manually started salt minion on radon, elastic1015, searchidx1001, it wasn't running there
09:00 godog: updated authdns to 0c2225d
08:56 springle: xtrabackup clone db1016 to db2010
07:48 godog: re-enabled icinga notifications for ms-be1001
07:09 bblack: removing pybal cfg "eqiad/misc_web_https" (unused now, https://gerrit.wikimedia.org/r/161183)
06:53 bblack: removing pybal cfg "esams/wikimedialbsecure" (unused, points at maerlant)
06:47 bblack: removing pybal symlink "$site/ipv6", also unused (old ipv6 protoproxying)
06:45 bblack: removing pybal symlink "$site/text-varnish", seems to be a remnant no longer in use
04:20 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Sep 18 04:20:56 UTC 2014 (duration 20m 55s)
03:09 logmsgbot: LocalisationUpdate completed (1.24wmf21) at 2014-09-18 03:09:44+00:00
02:53 logmsgbot: yurik Synchronized wmf-config/CommonSettings.php: (no message) (duration: 01m 53s)
02:52 yurikR: yurik Fixing graph ext namespace name - otherwise get screen of WMF death on graph: ns visits
02:36 logmsgbot: LocalisationUpdate completed (1.24wmf20) at 2014-09-18 02:36:46+00:00
00:32 logmsgbot: marktraceur Finished scap: [SWAT] Move things out of assets/ and into resources/assets/ (duration: 35m 28s)

September 17

23:57 logmsgbot: marktraceur Started scap: [SWAT] Move things out of assets/ and into resources/assets/
23:47 logmsgbot: marktraceur Synchronized wmf-config/InitialiseSettings.php: [SWAT] Enable Graph on metawiki and labswiki (duration: 00m 10s)
23:42 logmsgbot: marktraceur Synchronized php-1.24wmf21/extensions/Graph/: [SWAT] Update Graph to master (duration: 00m 08s)
23:41 logmsgbot: marktraceur Synchronized php-1.24wmf20/extensions/Graph/: [SWAT] Update Graph to master (duration: 00m 07s)
23:35 logmsgbot: marktraceur Synchronized php-1.24wmf21/extensions/MultimediaViewer/: [SWAT] Fix reuse dropdown message weirdness (duration: 00m 07s)
23:29 logmsgbot: marktraceur Synchronized php-1.24wmf20/extensions/MultimediaViewer/: [SWAT] Fix reuse dropdown message weirdness (duration: 00m 08s)
23:10 logmsgbot: marktraceur Synchronized php-1.24wmf21/extensions/UploadWizard/: [SWAT] Fix EventLogging schema declarations for UploadWizard (duration: 00m 11s)
21:41 mutante: fixing updates on planet feeds - file permissions
21:11 manybubbles: restarting rebuilding cirrus's enwiki index now that I've found the reason it wasn't working before - the new index was putting too many shards on an already full node and overwhelming it. silly allocation algorithm! thats a bad idea!
21:07 logmsgbot: yurik Synchronized php-1.24wmf21/extensions/ZeroPortal/: (no message) (duration: 01m 05s)
20:19 godog: rebooting ms-be1006
19:00 Krinkle: jenkins-slave tmpfs on lanthanum was filling up (> 500MB). I purged tmp dbs for old jobs. We should get these purged automatically and also increase the size as 500MB is too little.
18:59 robh: disabled icinga alerts for ms-be1001, rebooting it to look at its raid bios settings for codfw deployment mirroring
18:47 logmsgbot: yurik Synchronized php-1.24wmf20/extensions/: update to JsonConfig, ZeroBanner, ZeroPortal (duration: 01m 39s)
18:43 logmsgbot: yurik Synchronized php-1.24wmf21/extensions/: update to JsonConfig, ZeroBanner, ZeroPortal (duration: 01m 35s)
18:40 logmsgbot: yurik Synchronized wmf-config/: private wikis login/logout page names, zeroportal impersonator acct (duration: 01m 06s)
18:23 mutante: phabricator - made aklapper an admin
17:26 logmsgbot: andrew rebuilt wikiversions.cdb and synchronized wikiversions files: (no message)
17:23 logmsgbot: andrew Synchronized wikiversions.json: (no message) (duration: 00m 05s)
17:04 manybubbles: cirrus brownout looks just about fixed. So! My plan for periodically explicitly merging deletes has some problems.....
16:42 gwicke: restarted parsoid on wtp102{2,3,4}
16:31 manybubbles: just going to make this clear - the current cirrus brownout doesn't seem to be effecting my queries but we're getting hit with pool counter full events - sadness. its not caused by switching cirrus to ruwiki's primary backend - its caused by me attempting to perform index maintenance activities.
16:23 akosiaris: restarted node on wtp boxes except wtp1022,wtp1023,wtp1024
16:23 manybubbles: caused cirrus brownout by executing a force merge for enwiki's general index. ooops
16:06 logmsgbot: manybubbles Synchronized wmf-config/: set cirrus as primary search backend for ruwiki and make permanent some settings set on the fly (duration: 00m 06s)
15:57 manybubbles: manually pushed apart ruwiki and nlwiki's shards as well - might help - updated commit to reflect that
15:42 manybubbles: gerrit change to lock that into place is https://gerrit.wikimedia.org/r/#/c/160974/ and I'll deploy it in my window in 15 minutes.
15:41 manybubbles: manually forcing Cirrus's commonswiki's file index apart from one another in an attempt to lower the consistently high load on elastic1013
15:34 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: Set wgMetaNamespace for labswiki (duration: 00m 14s)
14:54 springle: db1062 out of action for bug hunt https://mariadb.atlassian.net/browse/MDEV-6751
14:48 logmsgbot: reedy Synchronized wmf-config/interwiki.cdb: (no message) (duration: 00m 16s)
14:45 godog: restarted apache2 on magnesium, validate removal of ssl certs
13:38 hashar: Zuul upgraded successfully apparently.
13:33 hashar: stopping zuul for upgrade
13:29 hashar: upgrading Zuul to 2.0.0.286.gb1811ab
12:20 hashar: upgrading jenkins 1.565.1 -> 1.565.2
09:53 akosiaris: stopped apache2 on fenari, it was leaking memory, puppet restarted it, need to kill this machine ASAP
09:52 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool s1 db1061 (duration: 00m 08s)
06:55 springle: xtrabackup clone db1061 to db2016
06:52 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool s1 db1061 for codfw cloning (duration: 00m 07s)
06:27 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool s7 db1039 (duration: 00m 08s)
04:34 logmsgbot: tstarling Synchronized docroot/bits: (no message) (duration: 00m 10s)
04:32 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Sep 17 04:32:17 UTC 2014 (duration 32m 16s)
03:17 logmsgbot: LocalisationUpdate completed (1.24wmf21) at 2014-09-17 03:17:38+00:00
03:07 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool s6 db1015 (duration: 01m 41s)
02:43 logmsgbot: LocalisationUpdate completed (1.24wmf20) at 2014-09-17 02:43:02+00:00
02:21 springle: xtrabackup clone db1048 to db2012
02:15 springle: xtrabackup clone db1046 to db2011
02:00 springle: xtrabackup clone db1016 to db2010
01:54 springle: xtrabackup clone db1031 to db2009
01:33 springle: xtrabackup clone db1039 to db2029
01:33 springle: xtrabackup clone db1015 to db2028
01:29 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool s6 db1015 and s7 db1039 (duration: 00m 20s)
01:15 Reedy: updateCollation on shwiki done
00:59 Reedy: running `mwscript updateCollation.php --wiki=shwiki --previous-collation=uppercase` in screen on tin
00:58 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: shwiki collation (duration: 00m 16s)
00:53 Reedy: updateCollation on etwiki done
00:52 Reedy: updateCollation on etwiktionary done
00:48 Reedy: running `mwscript updateCollation.php --wiki=etwiktionary --previous-collation=uppercase` in screen on tin
00:47 Reedy: etwikisource collation updated (9918 rows)
00:47 Reedy: etwikiquote collation updated (706 rows)
00:46 Reedy: etwikimedia collation updated (121 rows)
00:46 Reedy: etwikibooks collation updated (280 rows)
00:45 Reedy: running `mwscript updateCollation.php --wiki=etwiki --previous-collation=uppercase` in screen on tin
00:45 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: et collations (duration: 00m 15s)
00:43 Reedy: updateCollation on frwikiversity done
00:42 Reedy: running `mwscript updateCollation.php --wiki=frwikiversity --previous-collation=uppercase` in screen on tin
00:42 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: frwikiversity collation (duration: 00m 17s)
00:40 Reedy: updateCollation on skwiki done
00:26 Reedy: Running `mwscript updateCollation.php --wiki=skwiki --previous-collation=uppercase` in screen on tin
00:25 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: skwiki collation (duration: 00m 15s)
00:18 logmsgbot: reedy Synchronized database lists: (no message) (duration: 00m 15s)

September 16

23:22 logmsgbot: maxsem Synchronized php-1.24wmf21/extensions/VisualEditor/: (no message) (duration: 00m 04s)
23:21 logmsgbot: maxsem Synchronized php-1.24wmf20/extensions/VisualEditor/: (no message) (duration: 00m 04s)
23:16 logmsgbot: maxsem Synchronized php-1.24wmf21/extensions/Wikidata: (no message) (duration: 00m 24s)
23:15 MaxSem: Wikidata submodule in wmf21 was in the middle of rebase - reset and updating to a newer submodule commit
23:12 logmsgbot: maxsem Synchronized php-1.24wmf20/extensions/Wikidata: (no message) (duration: 00m 17s)
23:07 logmsgbot: maxsem Synchronized php-1.24wmf21/extensions/GettingStarted: https://gerrit.wikimedia.org/r/#/c/160084/ (duration: 00m 08s)
21:36 Jeff_Green: SPF record deployed for donate.wikimedia.org
21:01 logmsgbot: ejegg Synchronized php-1.24wmf20/extensions/CentralNotice/modules/ext.centralNotice.bannerController/bannerController.js: (no message) (duration: 00m 06s)
19:38 csteipp: deployed patches for bugs 70469 and 70672
19:17 logmsgbot: catrope Synchronized php-1.24wmf21/extensions/VisualEditor/: Revert IE hacks so Firefox will stop corrupting non-Latin characters (duration: 00m 06s)
19:15 logmsgbot: catrope Synchronized php-1.24wmf20/extensions/VisualEditor/: (no message) (duration: 00m 09s)
18:32 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 15s)
18:11 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non wikipedias to 1.24wmf21
17:03 logmsgbot: bd808 Finished scap: No code change scap to test scap internal update (duration: 18m 06s)
16:45 logmsgbot: bd808 Started scap: No code change scap to test scap internal update
16:43 bd808|deploy: Updated scap to 663f137 (Check php syntax with parallel `php -l`)
16:42 bd808|deploy: Trebuchet sync for scap reporting failure from osmium.eqiad.wmnet, mw1053.eqiad.wmnet, searchidx1001.eqiad.wmnet, fenari.wikimedia.org, and mw1110.eqiad.wmnet
16:41 bd808|deploy: Trebuchet update for scap reporting failure from osmium.eqiad.wmnet, searchidx1001.eqiad.wmnet, fenari.wikimedia.org and mw1110.eqiad.wmnet
16:00 _joe_: mw1018 and mw1021 in the hhvm appservers pool
15:35 logmsgbot: reedy Synchronized docroot and w: Update symlinks to use /srv/mediawiki (duration: 00m 16s)
15:34 hashar: Jenkins: deleting /srv/ssd/jenkins-slave/workspace/*testextensions-master on gallium and lanthanum.
15:25 logmsgbot: andrew Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 03s)
15:23 logmsgbot: andrew Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 19s)
15:20 manybubbles: SWAT complete
15:16 logmsgbot: manybubbles Synchronized php-1.24wmf20/extensions/VisualEditor/: swat update for wmf20 (duration: 00m 25s)
15:13 hashar: Jenkins: mediawiki extensions phpunit jobs should pass more or less until the CI system is sent an orbit and dies out horribly. in such a case ping me / phone.
15:08 logmsgbot: manybubbles Synchronized php-1.24wmf21/extensions/VisualEditor/: SWAT visual editor update wmf21 (duration: 00m 07s)
14:52 ottomata: set vm.dirty_expire_centisecs to 10000 (was 30000) on analytics1021 to experiment with paging and kafka-zookeeper timeouts
14:36 godog: stopped htcp-purger on ms1004 RT #8358
14:32 godog: silenced ms-be1014 until torrow, pending forced reboot
14:28 hashar: Jenkins: breaking continuous integration for MediaWiki repositories. Extensions are now tested with mediawiki/vendor and, mediawiki/core is checked out to the patch branch if it exist. 160656
14:20 akosiaris_: restarted apache on fenari , it was leaking memory, situation back to normal, cause unknown yet
14:12 akosiaris_: stopped apache on fenari . It was in swap, investigating
12:35 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool s2 db1054, s3 db1027, s4 db1056, s5 db1037 (duration: 00m 10s)
12:26 godog: reboot ms-be1014, xfs issues
12:22 godog: temporarily chgrp wikidev /var/log/hhvm/error.log on mw1018
12:21 logmsgbot: reedy Synchronized php-1.24wmf20/LocalSettings.php: Fix path to be /srv based (duration: 00m 32s)
11:25 logmsgbot: reedy Synchronized docroot and w: (no message) (duration: 00m 35s)
11:12 logmsgbot: reedy Purged l10n cache for 1.24wmf19
11:12 logmsgbot: reedy Purged l10n cache for 1.24wmf18
11:10 logmsgbot: reedy Purged l10n cache for 1.24wmf15
09:21 _joe_: reimaging mw1018 and mw1021 w HAT: removing from pybal, etc.
06:29 springle: xtrabackup clone db1037 to db2023
05:31 springle: xtrabackup clone db1056 to db2019
04:01 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Sep 16 04:01:05 UTC 2014 (duration 1m 4s)
03:11 springle: xtrabackup clone db1027 to db2018
03:04 logmsgbot: LocalisationUpdate completed (1.24wmf21) at 2014-09-16 03:04:46+00:00
02:53 springle: xtrabackup clone db1054 to db2017
02:50 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool s2 db1054, s3 db1027, s4 db1056, s5 db1037 for codfw cloning (duration: 01m 12s)
02:39 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1036, depool db1002 (duration: 00m 07s)
02:31 logmsgbot: LocalisationUpdate completed (1.24wmf20) at 2014-09-16 02:31:16+00:00

September 15

23:32 logmsgbot: maxsem Synchronized php-1.24wmf21/resources/: SWAT: https://gerrit.wikimedia.org/r/#/c/160488/1 https://gerrit.wikimedia.org/r/#/c/160543/ (duration: 00m 06s)
23:26 bblack: restarting lvs1001 for HT disable + kernel upgrade
23:19 logmsgbot: maxsem Synchronized php-1.24wmf21/extensions/VisualEditor/: SWAT: https://gerrit.wikimedia.org/r/#/c/160554/ (duration: 00m 07s)
23:12 bblack: restarting lvs1002 for HT disable + kernel upgrade
23:07 Krinkle: Running sample job on integration-slave1006 and warming up npmjs.org cache
22:56 Krinkle: Running sample job on integration-slave1008 and warming up npmjs.org cache
22:49 Krinkle: Running sample job on integration-slave1007 and warming up npmjs.org cache
22:48 Krinkle: Pooling the newly setup Trusty-based Jenkins slaves (integration-slave1006, integration-slave1007 and integration-slave1008)
22:42 bblack: dropping static routes for 2620:0:861:ed1a::[d,f,10,11] -> lvs1005 from cr[12]-eqiad (only 11 is of any consequence, misc-web-lb, and they're advertised by bgp and this is preventing failover to lvs1002)
21:28 cscott: updated OCG to version 188a3c221d927bd0601ef5e1b0c0f4a9d1cdbd31
20:46 subbu: deployed Parsoid version b845bff9
18:49 logmsgbot: ejegg Synchronized php-1.24wmf20/extensions/CentralNotice/: Update CentralNotice to remove jquery.json dependency (duration: 00m 23s)
18:46 hoo: Sync to tmh100[12] failed, according to awight
18:44 logmsgbot: ejegg Synchronized php-1.24wmf21/extensions/CentralNotice/: Update CentralNotice to remove jquery.json dependency (duration: 00m 09s)
18:43 manybubbles: performance tests show cirrus should handle jawiki with no problem but if load spirals out of control and I'm not around then revert https://gerrit.wikimedia.org/r/#/c/160465/
18:40 hoo: Local part of the global rename of Gnumarcoo => .avgas fatally timed out on itwiki. This needs to be fixed per hand.
18:40 manybubbles: Setting Cirrus to jawiki's primary search backend went well but Japan is mostly asleep. If Elasticsearch load takes a turn for the worse in four or five hours then we'll know how it went.
17:14 bd808: Restarted elasticsearch on logstash1003; 2014-09-14T09:33:57Z java.lang.OutOfMemoryError
17:09 _joe_: killing salt-call on all mediawiki hosts
17:06 bd808: Restarted elasticsearch on logstash1001; 2014-09-15T06:12:09Z java.lang.OutOfMemoryError
17:04 bblack: using salt to kill salt-minion everywhere...
17:02 bd808: Restarted logstash on logstash1001. I hoped this would fix the dashboards, but it looks like the backing elasticsearch cluster is too sad for them to work at the moment.
16:55 bd808: Restarted hung elasticsearch service on logstash1002
16:15 manybubbles: jawiki now has cirrus as primary. we're back to where we were before the great cascading failure of two months ago
16:13 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 06s)
15:29 logmsgbot: marktraceur Synchronized php-1.24wmf21/extensions/MultimediaViewer/: [SWAT] Several backports for metrics and bugfixes in Media Viewer (duration: 00m 07s)
15:27 logmsgbot: marktraceur Synchronized php-1.24wmf20/extensions/MultimediaViewer/: [SWAT] Several backports for metrics and bugfixes in Media Viewer (duration: 00m 07s)
15:18 logmsgbot: marktraceur Synchronized php-1.24wmf21/extensions/GeoCrumbs/GeoCrumbs.class.php: [SWAT] Handle return value NULL of GeoCrumbs::getParserCache (duration: 00m 07s)
15:17 logmsgbot: marktraceur Synchronized php-1.24wmf20/extensions/GeoCrumbs/GeoCrumbs.class.php: [SWAT] Handle return value NULL of GeoCrumbs::getParserCache (duration: 00m 07s)
15:06 logmsgbot: marktraceur Synchronized wmf-config/: [SWAT] Remove 'renameuser' right from bureaucrats on CentralAuth wikis (duration: 00m 09s)
14:54 logmsgbot: aude Synchronized wmf-config/Wikibase.php: Bump wikibase memcached key for test.wikidata, test, test2 (duration: 00m 16s)
14:54 hashar: Updated Jenkins Job Builder fork: e5c0c61..2d74b16
14:50 logmsgbot: aude Finished scap: Put test.wikidata back on mw1.24-wmf19 extension branch (duration: 37m 27s)
14:43 manybubbles: restarting the enwiki cirrus reindex process - it crashed over the weekend. why you crash and leave error message "1". "1" is not a useful error message.
14:13 logmsgbot: aude Started scap: Put test.wikidata back on mw1.24-wmf19 extension branch
13:03 _joe_: fenari is swapping hard, restarting apache who was eating up all the RAM
09:20 logmsgbot: hashar Synchronized wmf-config/InitialiseSettings.php: *.scienceimage.csiro.au to the wgCopyUploadsDomains 159999 bug 70771 (duration: 00m 06s)
09:15 hashar: Jenkins: apt-get upgrade on prod slaves (updates php5 / libc / jdk 7)
03:09 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1036 (duration: 00m 09s)
02:03 logmsgbot: LocalisationUpdate failed: mwversionsinuse returned empty list
01:47 logmsgbot: hoo Synchronized wmf-config/liquidthreads.php: Remove global $path (duration: 00m 07s)
01:47 logmsgbot: hoo Synchronized wmf-config/flaggedrevs.php: Remove global $path (duration: 00m 10s)

September 14

20:37 ori_: enabling puppet on mw1053
20:11 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1062, locked up (duration: 00m 09s)
13:24 _joe_: stopped puppet aand the JR on mw1053
12:42 hoo: Ran sync-common on mw1053 to stop "Unrecognized job type 'ChangeNotification'." exceptions
11:14 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool es1005 (duration: 00m 07s)
10:37 springle: restart es1005
09:56 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool es1007, depool es1005 (duration: 00m 10s)
02:01 logmsgbot: LocalisationUpdate failed: mwversionsinuse returned empty list
00:45 ori_: fenari appears to still have twemproxy (in addition to nutcracker); decom'ing.
00:29 ori_: restarting apache2 on fenari

September 13

04:42 legoktm: global rename for Trevor Parscal (WMF) unstuck itself, yay
04:22 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Sep 13 04:22:04 UTC 2014 (duration 22m 3s)
03:51 legoktm: global rename for Trevor Parscal --> Trevor Parscal (WMF) looks stuck on metawiki and mswiki, in queued state for both but showJobs.php says the jobs are active and claimed
03:11 logmsgbot: LocalisationUpdate completed (1.24wmf21) at 2014-09-13 03:11:40+00:00
02:38 logmsgbot: LocalisationUpdate completed (1.24wmf20) at 2014-09-13 02:38:26+00:00
01:45 logmsgbot: ori Synchronized php-1.24wmf21/extensions/Flow: Update flow for I4da934dfe (duration: 00m 06s)
01:45 logmsgbot: ori Synchronized php-1.24wmf20/extensions/Flow: Update flow for I4da934dfe (duration: 00m 06s)
01:41 logmsgbot: ori Synchronized php-1.24wmf20/extensions/Flow: Update flow for I4da934dfe (duration: 00m 08s)

September 12

21:26 csteipp: deployed fixes for bugs 70620, 69008
20:37 logmsgbot: mattflaschen Synchronized php-1.24wmf21/extensions/GettingStarted/: Deploy to fix GettingStarted bucketting for users with null registration date (duration: 00m 05s)
20:37 logmsgbot: mattflaschen Synchronized php-1.24wmf20/extensions/GettingStarted/: Deploy to fix GettingStarted bucketting for users with null registration date (duration: 00m 07s)
19:34 legoktm: running migratePass0.php across all CentralAuth wikis
17:43 logmsgbot: ori updated /a/common to I4e4187285: Rename some constants to clarify their meaning and purpose
14:52 manybubbles: rebuilding enwiki's Cirrus index for more performance testing. Please be faster now. k?
08:37 _joe_: rolling restart of pybal finished. Adding note on Fenari
08:19 _joe_: reactivated puppet on all lvs hosts, esams almost done, pending eqiad
08:06 _joe_: new pybal conf applied in all of ulsfo
07:39 _joe_: changing pybal config place; stopping puppet on all loadbalancers
04:27 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Sep 12 04:27:17 UTC 2014 (duration 27m 16s)
03:15 logmsgbot: LocalisationUpdate completed (1.24wmf21) at 2014-09-12 03:15:57+00:00
03:08 logmsgbot: mattflaschen Finished scap: One last CSS fix (wrapping issue for error state) for GettingStarted A/B test (duration: 24m 38s)
02:43 logmsgbot: mattflaschen Started scap: One last CSS fix (wrapping issue for error state) for GettingStarted A/B test
02:39 logmsgbot: LocalisationUpdate completed (1.24wmf20) at 2014-09-12 02:39:35+00:00
01:33 logmsgbot: mattflaschen Synchronized php-1.24wmf21/extensions/GettingStarted/: CSS tweaks for GettingStarted A/B test (duration: 00m 07s)
01:32 logmsgbot: mattflaschen Synchronized php-1.24wmf20/extensions/GettingStarted/: CSS tweaks for GettingStarted A/B test (duration: 00m 21s)
01:29 logmsgbot: ori Synchronized wmf-config/wikitech.php: Ia5b81076e: Update path reference for /srv/mediawiki (duration: 00m 04s)
01:28 logmsgbot: ori updated /a/common to Ia5b81076e: Update path reference for /srv/mediawiki
01:19 ori: manually migrated /u/l/a/common-local to /srv/mediawiki on virt1000
00:36 logmsgbot: ori Synchronized php-1.24wmf21/extensions/Wikidata: Update Wikidata to tip of master for I23b7eb54b8e (Bug: 70747) (duration: 00m 08s)
00:12 logmsgbot: esanders Synchronized php-1.24wmf21/resources/lib/oojs-ui/: (no message) (duration: 00m 03s)
00:12 logmsgbot: esanders Synchronized php-1.24wmf21/extensions/MultimediaViewer/: (no message) (duration: 00m 07s)
00:00 logmsgbot: esanders Finished scap: SWAT deploy (duration: 28m 39s)

September 11

23:31 logmsgbot: esanders Started scap: SWAT deploy
23:29 logmsgbot: mattflaschen Finished scap: Deploy new GettingStarted recommendations A/B test (duration: 99m 34s)
23:15 logmsgbot: esanders scap failed: LockFailedError Failed to lock /var/lock/scap: [Errno 11] Resource temporarily unavailable (duration: 00m 00s)
23:00 mutante: restarting icinga-wm for config change
21:49 logmsgbot: mattflaschen Started scap: Deploy new GettingStarted recommendations A/B test
21:14 Krinkle: Stopping/starting zuul
21:08 andrewbogott: restarting zuul on gallium
20:58 andrewbogott: restarted jenkins, maybe
20:56 ori: graceful'd apache on mw1053, missed it earlier
20:49 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: I1f3234746: Revert Scribunto: double the Lua CPU limit on the job runners (duration: 00m 05s)
20:48 logmsgbot: ori updated /a/common to I1f3234746: Revert "Scribunto: double the Lua CPU limit on the job runners"
20:42 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 14s)
20:15 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 13s)
20:15 andrewbogott: syncing virt1000, again in hopes of moving to wmf20
20:08 logmsgbot: reedy Synchronized php-1.24wmf21/extensions/Wikidata/: (no message) (duration: 00m 17s)
19:58 Reedy: Running sync-common on mw1024
19:52 Reedy: Running manual sync-common on mw1138
19:51 logmsgbot: reedy Synchronized wmf-config/: Fix Zero settings (duration: 00m 15s)
19:49 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.24wmf21
19:44 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias to 1.24wmf20
19:20 mutante: graceful'ed apache on mw1143
19:16 Reedy: running sync-common on mw1143
19:10 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: (no message)
19:02 bd808: Restarted elasticsearch on logstash1003 -- Java OOM error in logs and not recovering shards
18:54 ori: graceful'd all apaches
18:51 ori: graceful'd apache on mw1047, mw1151, mw1137, mw1146 and mw1076
18:46 logmsgbot: ori Synchronized php-1.24wmf19/includes/WebStart.php: (no message) (duration: 00m 06s)
18:45 logmsgbot: ori Synchronized php-1.24wmf19/includes/profiler/Profiler.php: (no message) (duration: 00m 07s)
18:17 logmsgbot: reedy Started scap: testwiki to 1.24wmf21 and build l10n cache take 3
18:16 logmsgbot: reedy scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki="testwiki" --list-file="/a/common/wmf-config/extension-list" --output="/tmp/tmp.Nd45X2RONi" --verbose' returned non-zero exit status 1 (duration: 01m 18s)
18:14 logmsgbot: reedy Started scap: testwiki to 1.24wmf21 and build l10n cache
18:13 logmsgbot: reedy scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki="testwiki" --list-file="/a/common/wmf-config/extension-list" --output="/tmp/tmp.IH8przTNHs" ' returned non-zero exit status 1 (duration: 04m 59s)
18:08 logmsgbot: reedy Started scap: testwiki to 1.24wmf21 and build l10n cache
18:02 manybubbles: raised logging on Elasticsearch cluster temporarily to get more information about merging - a process super important to keeping the index up to date in "real time"
17:20 logmsgbot: ori updated /a/common to I0bda3deab: Replace remaining references to /u/l/a/common
17:18 logmsgbot: ori updated /a/common to I37b0a8338: Get rid of MULTIVER_CDB_DIR_{APACHE,HOME}
16:57 andrewbogott: sync-common on virt1000 -- with any luck this will upgrade us to wmf20
16:56 logmsgbot: andrew rebuilt wikiversions.cdb and synchronized wikiversions files: (no message)
16:53 logmsgbot: bd808 Finished scap: Preparing to move wikitech to 1.24wmf20 (second try) (duration: 24m 25s)
16:46 andrewbogott: apache graceful on mw1039
16:33 bd808|deploy: andrewbogott did apache graceful on mw1120 to stop wikidata APC logspam
16:29 logmsgbot: bd808 Started scap: Preparing to move wikitech to 1.24wmf20 (second try)
16:22 logmsgbot: andrew Finished scap: Preparing to move wikitech to 1.24wmf20 (duration: 06m 45s)
16:19 bd808: Restarted logstash on logstash1001. Log empty and events not being stored in elasticsearch
16:15 logmsgbot: andrew Started scap: Preparing to move wikitech to 1.24wmf20
15:45 bblack: icinga config is correct now, back to normal puppet updates
15:24 bblack: restarted icinga, manually removed some labsy things that were broken in config and temporarily disabled puppet :p
14:44 _joe_: php upgrade finished
14:23 _joe_: upgrading php across the cluster: libapache2-mod-php5 php5-cli php-pear php5 php5-common php5-curl php5-dev php5-intl php5-mysql php5-xmlrpc
13:04 akosiaris: uploaded php5_5.3.10-1ubuntu3.14+wmf1 on apt.wikimedia.org
10:00 _joe_: enabled puppet on mw1053
09:38 _joe_: gracefulling mw1200 mw1196 and mw1186 as they have APC issues
09:21 _joe_: upgrading hhvm and hhvm-luasandbox across the production cluster
09:00 akosiaris: upgrading php5 to 5.3.10-1ubuntu3.14+wmf1 on mw1212
08:34 _joe_: updating php-pear php5 php5-cli php5-common php5-curl php5-dev php5-intl php5-mysql php5-xmlrpc libapache2-mod-php5 on mw1018, see USN 2344-1
03:41 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Sep 11 03:41:03 UTC 2014 (duration 41m 2s)
02:49 logmsgbot: LocalisationUpdate completed (1.24wmf20) at 2014-09-11 02:49:26+00:00
02:36 logmsgbot: LocalisationUpdate completed (1.24wmf19) at 2014-09-11 02:36:37+00:00
02:23 logmsgbot: LocalisationUpdate completed (1.24wmf15) at 2014-09-11 02:23:29+00:00
00:28 mutante: graceful'ed Apaches on mw1171, mw1187
00:25 logmsgbot: ori Synchronized wmf-config: Id607bf36d: Update remaining references to /u/l/a/common-local (duration: 00m 03s)
00:25 logmsgbot: ori Synchronized multiversion: Id607bf36d: Update remaining references to /u/l/a/common-local (duration: 00m 04s)
00:22 logmsgbot: ori Synchronized docroot and w: Id607bf36d: Update remaining references to /u/l/a/common-local (duration: 00m 04s)
00:07 logmsgbot: ori updated /a/common to Id607bf36d: Update remaining references to /u/l/a/common-local

September 10

23:44 mutante: graceful'ed mw1202 apache
23:29 mutante: deleted labstore1003.eqiad.wmnet.org from puppet stored resource db, fixes puppet runs on hosts with ssh host key collection
23:26 logmsgbot: oblivian gracefulled all apaches
23:22 logmsgbot: maxsem Synchronized php-1.24wmf20/includes/specialpage/SpecialPageFactory.php: https://gerrit.wikimedia.org/r/#/c/159526/ (duration: 00m 03s)
23:22 logmsgbot: maxsem Synchronized php-1.24wmf19/includes/specialpage/SpecialPageFactory.php: https://gerrit.wikimedia.org/r/#/c/159526/ (duration: 00m 03s)
23:21 logmsgbot: maxsem Synchronized php-1.24wmf20/extensions/CentralAuth/: (no message) (duration: 00m 03s)
23:21 logmsgbot: maxsem Synchronized php-1.24wmf19/extensions/CentralAuth/: (no message) (duration: 00m 04s)
23:19 logmsgbot: maxsem Synchronized php-1.24wmf10/resources/: https://gerrit.wikimedia.org/r/#/c/159513/ (duration: 00m 05s)
22:52 mutante: labstore1003 - (earlier) revoked salt and puppet key and signed new after hostname fix - same salt-minion puppet errors that happen after reinstalls
19:52 Reedy: Created Echo tables on extension1 for cawikimedia
19:51 RobH: puppet disabled on carbon (install server) for a livehack test of config setting
18:51 yurikR: yurik CommonSettings.php - zerowiki perm changes
18:51 logmsgbot: yurik Synchronized wmf-config/CommonSettings.php: (no message) (duration: 01m 05s)
18:26 logmsgbot: yurik Synchronized php-1.24wmf20/extensions/ZeroBanner: (no message) (duration: 01m 09s)
18:22 logmsgbot: yurik Synchronized php-1.24wmf19/extensions/ZeroBanner: (no message) (duration: 01m 11s)
18:00 manybubbles: cirrus index rebuild for test2wiki went well - doing the rest of group0
17:35 manybubbles: rebuilding cirrus index for test2wiki to test some performance enhancements don't break anything. test2wiki is too small to see any gain from the enhancements though.
17:25 Reedy: mw1126, mw1116, mw1122, mw1146, mw1121, mw1136, mw1114, mw1068 have been gracefulled
17:10 bd808: Restarted logstash on logstash1001
16:03 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: nlwiki cirrus (duration: 00m 04s)
15:44 logmsgbot: reedy Synchronized docroot and w: (no message) (duration: 00m 15s)
15:02 logmsgbot: demon Synchronized wmf-config/wikitech.php: no-op (duration: 00m 06s)
09:13 godog: rolling restart swift-proxy on ms-fe1*
04:17 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Sep 10 04:17:36 UTC 2014 (duration 17m 35s)
03:08 logmsgbot: LocalisationUpdate completed (1.24wmf20) at 2014-09-10 03:07:59+00:00
02:36 logmsgbot: LocalisationUpdate completed (1.24wmf19) at 2014-09-10 02:36:00+00:00
02:28 ori: updated salt key for iridium and restarted salt-minion
02:18 mutante: started salt-minion on iridium

September 9

23:15 Krinkle: Reloading Zuul to deploy I26bc21ed2938e97e7ed6f6b
23:15 logmsgbot: demon Synchronized php-1.24wmf20/extensions/CirrusSearch: Various fixes for things (duration: 00m 05s)
23:00 mutante: added wikimedia.org to search in resolv.conf on terbium
22:42 logmsgbot: ebernhardson Synchronized wmf-config/InitialiseSettings.php: Deploy config change I158e7c6852 (duration: 00m 04s)
22:23 Krinkle: Reloading Zuul to deploy I27024680c74ca0130
22:21 logmsgbot: ebernhardson Finished scap: Bump Echo and Flow versions in 1.24wmf19 (duration: 31m 25s)
21:49 logmsgbot: ebernhardson Started scap: Bump Echo and Flow versions in 1.24wmf19
20:42 akosiaris: service gmetad restart on nickel.wikimedia.org due to ganglia web not working
20:15 cscott: updated OCG to version c9a2b4cf2502479eeabed07ab2de728695d96e46
19:05 mutante: killed jgonera's screen session on stat1002 - puppet failed to deactivate otherwise
18:46 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 14s)
18:32 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 15s)
18:31 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Add cawikimedia
18:28 logmsgbot: reedy Synchronized multiversion/: (no message) (duration: 00m 14s)
18:14 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non wikipedias to 1.24wmf20
16:03 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: eswiki getting cirrus (duration: 00m 04s)
15:32 bblack: deploying large DNS change https://gerrit.wikimedia.org/r/#/c/158382/ - be on the lookout for any related fallout from here...
15:27 marktraceur: [SCAP] Deployed fix for oojs class names at James_F's behest, sorry for lack of message.
15:26 logmsgbot: marktraceur Synchronized php-1.24wmf20/extensions/MobileFrontend/less/modules/editor/VisualEditorOverlay.less: (no message) (duration: 00m 07s)
15:08 logmsgbot: marktraceur Synchronized php-1.24wmf20/tests/phpunit/includes/changes/OldChangesListTest.php: [SWAT] Fix undefined argument (css classes) in OldChangesList. (duration: 00m 07s)
15:06 logmsgbot: marktraceur Synchronized php-1.24wmf20/includes/changes/OldChangesList.php: [SWAT] Fix undefined argument (css classes) in OldChangesList. (duration: 00m 07s)
11:29 _joe_: git.wikimedia.org works now, no action needed
11:26 MatmaRex: git.wikimedia.org is down: Error: 503, Service Unavailable
10:04 _joe_: also re-enabling puppet
10:02 _joe_: restarting manually apache on mw1178,mw1192,mw1163,mw1130,mw1018 as they started with the wrong pidfile before my fix
09:24 _joe_: disabling puppet on appservers
08:55 godog: launched "iptables" on tin to check current rules and it loaded iptables modules, logging for future reference
08:10 _joe_: re-enabling puppet on appservers and imagescalers, change is good
08:08 _joe_: restarted apache2 on mw1018
08:06 _joe_: stopping apache on mw1018 for inspection
07:36 _joe_: that was on appservers
07:36 _joe_: disabling puppet, releasing a potentially harmful apache change
04:56 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Sep 9 04:56:25 UTC 2014 (duration 56m 24s)
03:44 logmsgbot: LocalisationUpdate completed (1.24wmf20) at 2014-09-09 03:44:07+00:00
03:11 logmsgbot: LocalisationUpdate completed (1.24wmf19) at 2014-09-09 03:11:27+00:00
02:38 logmsgbot: LocalisationUpdate completed (1.24wmf15) at 2014-09-09 02:38:38+00:00
01:02 logmsgbot: ebernhardson Synchronized php-1.24wmf20/extensions/Flow/includes/Content/BoardContentHandler.php: Sync BoardContentHandler.php for Flow in 1.24wmf20 (duration: 00m 04s)
00:22 mutante: re-enabled mw1070 in pybal
00:19 logmsgbot: ebernhardson Finished scap: Repeat SWAT scap deployment due to possible sync-common failure (duration: 38m 50s)

September 8

23:59 ori: restarted rsync on mw1070 to unblock scap
23:40 logmsgbot: ebernhardson Started scap: Repeat SWAT scap deployment due to possible sync-common failure
23:39 logmsgbot: ebernhardson Finished scap: SWAT deploy updates to Flow, Echo and Thanks (duration: 24m 00s)
23:34 mutante: disabled mw1070 in pybal because it refused sync
23:31 ebernhardson: scap failed to connect to mw1070. Repeated message: rsync: failed to connect to mw1070.eqiad.wmnet (10.64.16.50): Connection refused (111)
23:15 logmsgbot: ebernhardson Started scap: SWAT deploy updates to Flow, Echo and Thanks
23:02 logmsgbot: ebernhardson Synchronized wmf-config/InitialiseSettings.php: gerrit:159089 Enable $wgContentHandlerUseDB on mediawikiwiki, testwiki, & test2wiki (duration: 00m 05s)
20:14 subbu: deployed Parsoid ce108cb5
18:01 logmsgbot: demon Synchronized php-1.24wmf19/extensions/Wikidata: Updating Wikidata to f1d2110 (duration: 00m 09s)
17:19 mutante: disabled notifications for puppet freshness on neon
16:19 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: svwiki: Cirrus as primary (duration: 00m 04s)
15:42 logmsgbot: manybubbles Synchronized php-1.24wmf20/extensions/Wikidata/: SWAT update wikidata to fix add links widget (duration: 00m 06s)
15:32 logmsgbot: manybubbles Synchronized php-1.24wmf20/extensions/LiquidThreads/: SWAT update liquidthreads to fix some missing images (duration: 00m 04s)
15:28 manybubbles: 15:13:53 Synchronized php-1.24wmf19/extensions/WikiLove/: SWAT fix for WikiLove (duration: 00m 04s)
15:28 manybubbles: this is the missing log:
15:27 manybubbles: sync logging was down so it missed some syncing I just did.
15:25 logmsgbot: manybubbles Synchronized php-1.24wmf20/extensions/WikiLove/: (no message) (duration: 00m 05s)
15:20 logmsgbot: manybubbles Synchronized wmf-config: SWAT another cirrus setting update (duration: 00m 04s)
15:10 logmsgbot: manybubbles Synchronized wmf-config: SWAT finish updating Cirrus settings (duration: 00m 05s)
15:10 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: SWAT update some cirrus settings (duration: 00m 04s)
15:10 cmjohnson1: shutting down neon for memory upgrade
14:41 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1072 (duration: 00m 09s)
12:23 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1073, depool db1072 (duration: 00m 06s)
11:07 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1073 (duration: 00m 09s)
10:55 _joe_: re-enabled puppet, the change results in a no-op as expected
10:42 _joe_: disabling puppet on all appservers while updating apache config.
04:58 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: move enwiki api traffic to db1051/db1066 (duration: 00m 09s)
03:37 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Sep 8 03:36:13 UTC 2014 (duration 36m 12s)
02:45 logmsgbot: LocalisationUpdate completed (1.24wmf20) at 2014-09-08 02:44:47+00:00
02:33 logmsgbot: LocalisationUpdate completed (1.24wmf19) at 2014-09-08 02:32:37+00:00
02:20 logmsgbot: LocalisationUpdate completed (1.24wmf15) at 2014-09-08 02:19:52+00:00

September 7

23:35 Tim: upgrading liblua everywhere
20:36 ori: mw1017: upgraded HHVM from 3.3-dev+20140728+wmf5 to 3.3-dev+20140728+wmf6
15:12 apergos: manually changed /etc/hosts entry on analytics1004 from having "analyticas1004.eqiad.wmnet" to "analytics1004.eqiad.wmnet"
06:15 godog: powercycle ms-be1005, not even responsive on console
03:30 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Sep 7 03:29:51 UTC 2014 (duration 29m 50s)
02:43 logmsgbot: LocalisationUpdate completed (1.24wmf20) at 2014-09-07 02:42:12+00:00
02:31 logmsgbot: LocalisationUpdate completed (1.24wmf19) at 2014-09-07 02:30:15+00:00
02:18 logmsgbot: LocalisationUpdate completed (1.24wmf15) at 2014-09-07 02:17:44+00:00

September 6

03:43 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Sep 6 03:42:22 UTC 2014 (duration 42m 21s)
02:51 logmsgbot: LocalisationUpdate completed (1.24wmf20) at 2014-09-06 02:50:35+00:00
02:38 logmsgbot: LocalisationUpdate completed (1.24wmf19) at 2014-09-06 02:37:41+00:00
02:25 logmsgbot: LocalisationUpdate completed (1.24wmf15) at 2014-09-06 02:24:35+00:00

September 5

23:28 logmsgbot: kaldari Synchronized wmf-config/mobile-labs.php: enabling wikigrok on beta labs (en only) (duration: 00m 03s)
23:28 logmsgbot: kaldari Synchronized wmf-config/InitialiseSettings-labs.php: enabling wikigrok on beta labs (en only) (duration: 00m 04s)
23:27 logmsgbot: kaldari updated /a/common to Iec209bde0: Map config var for $wgMFEnableWikiGrok
22:25 logmsgbot: kaldari Synchronized wmf-config/InitialiseSettings-labs.php: enabling wikigrok on beta labs (en only) (duration: 00m 05s)
22:25 logmsgbot: kaldari updated /a/common to I6039956eb: Enable Wikigrok prototype for beta labs (enwiki only)
22:24 awight: Deleted Light User and Merkle roles from the CRM
20:20 RobH: coms folks still accessing blog data on holmium, powering back up
20:18 bblack: restarted cp1056 bits cache and re-enabled in pybal
18:34 mark: Depooled cp1056 for testing
17:50 logmsgbot: ori Synchronized docroot and w: Iaa7518613: Fix spelling in symlink (duration: 00m 15s)
17:45 logmsgbot: ori Synchronized docroot and w: I55a01a712: Fix relative symlinks for bits/static-master (duration: 00m 13s)
13:00 Jeff_Green: lutetium dist-upgrade and reboot
12:04 legoktm: running extensions/GlobalCssJs/removeOldManualUserPages.php for m:GlobalCssJs
07:59 springle: dump es1007 to db1004, tokudb external storage page compression test. ok to kill in emergency
06:40 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool es1007 (duration: 00m 07s)
04:36 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Sep 5 04:35:11 UTC 2014 (duration 35m 10s)
04:27 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool es1002 (duration: 00m 06s)
04:01 springle: reboot es1002, fs check
03:47 logmsgbot: LocalisationUpdate completed (1.24wmf20) at 2014-09-05 03:46:28+00:00
03:46 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool es1002 for upgrade (duration: 00m 07s)
03:37 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1062 and db1068 (duration: 02m 06s)
03:10 logmsgbot: LocalisationUpdate completed (1.24wmf19) at 2014-09-05 03:09:20+00:00
02:56 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1062 and db1068 for upgrade (duration: 00m 56s)
02:39 logmsgbot: LocalisationUpdate completed (1.24wmf15) at 2014-09-05 02:38:32+00:00
01:01 manybubbles: applied same elasticsearch configuration to dewiki, eswiki, zhwiki, and frwiki
00:18 manybubbles: configured elasticsearch to force enwiki's content shards to stay off of the same nodes. That ought to help performance.

September 4

23:34 logmsgbot: catrope Synchronized php-1.24wmf20/extensions/VisualEditor/: (no message) (duration: 00m 05s)
23:34 logmsgbot: catrope Synchronized php-1.24wmf20/extensions/ZeroPortal/: (no message) (duration: 00m 04s)
23:34 logmsgbot: catrope Synchronized php-1.24wmf20/extensions/Flow/: (no message) (duration: 00m 05s)
23:34 logmsgbot: catrope Synchronized php-1.24wmf20/includes/specials/: (no message) (duration: 00m 04s)
23:31 logmsgbot: catrope Synchronized php-1.24wmf19/extensions/ZeroPortal: (no message) (duration: 00m 05s)
23:30 logmsgbot: catrope Synchronized php-1.24wmf19/extensions/Flow: (no message) (duration: 00m 05s)
22:52 logmsgbot: reedy Finished scap: consistency (duration: 20m 44s)
22:31 logmsgbot: reedy Started scap: consistency
22:28 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Sep 4 22:27:45 UTC 2014 (duration 54m 38s)
21:54 bd808: sync-dir failure was really on osmium, not mw1161; confusing error messages are confusing
21:50 bd808: Running sync-common on mw1161 to try and reproduce error seen during sync-file
21:43 logmsgbot: spage Synchronized wmf-config/InitialiseSettings.php: Enable Flow on pages, including frwiki and hewiki (duration: 00m 09s)
21:40 logmsgbot: spage updated /a/common to Ib0aaa60f0: Enable Flow on several pages
21:08 logmsgbot: LocalisationUpdate completed (1.24wmf20) at 2014-09-04 21:07:10+00:00
20:56 MaxSem: Running cleanupPageProps.php from terbium, now for realz
20:42 mutante: restarting icinga-wm, making it join #wikidata for custom output
20:16 logmsgbot: LocalisationUpdate completed (1.24wmf19) at 2014-09-04 20:15:35+00:00
19:56 Reedy: mw1088 and mw1100 rsync errors during the manual l10n update
19:25 logmsgbot: LocalisationUpdate completed (1.24wmf15) at 2014-09-04 19:23:57+00:00
18:32 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.24wmf20
18:26 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias to 1.24wmf19
18:11 logmsgbot: reedy Synchronized php-1.24wmf19: (no message) (duration: 00m 55s)
18:10 logmsgbot: reedy Synchronized php-1.24wmf20: (no message) (duration: 00m 35s)
18:09 logmsgbot: reedy Finished scap: testwiki to 1.24wmf20 and build l10n cache (duration: 41m 33s)
18:05 mutante: restarting service gitblit on antimony
17:48 RobH: correction, simply surpressing alerts for the host in icinga is the better move, as the host isnt reclaimed yet, so not removing holmium from pupeptstoreddb
17:46 RobH: stopping puppet on holmium and removing it from puppetstoreddb so it doesnt show in icinga once updated
17:45 RobH: shutting down holmium, as blog has migrated for a month now. Not yet wiping system, please leave for me (robh)
17:27 logmsgbot: reedy Started scap: testwiki to 1.24wmf20 and build l10n cache
16:44 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 17s)
16:36 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 14s)
15:53 logmsgbot: andrew Synchronized private/WikitechPrivateSettings.php: (no message) (duration: 00m 01s)
15:40 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 14s)
15:18 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: plwiki gets Cirrus (duration: 00m 06s)
14:56 bd808: ori updated scap to 773f95f (change deploy_dir to /srv/mediawiki) ~15 hours ago
08:16 _joe_: running sync-common on mw1017, trying to debug the hhvm bad state
06:37 godog: clear slowlog on elastic1004
05:25 jeremyb: temp hack fix deployed for morebots (here and labs, not the other instances)
04:47 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1035, warm up (duration: 00m 08s)
04:32 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Sep 4 04:31:28 UTC 2014 (duration 31m 27s)
03:43 logmsgbot: LocalisationUpdate completed (1.24wmf19) at 2014-09-04 03:42:34+00:00
03:13 logmsgbot: LocalisationUpdate completed (1.24wmf18) at 2014-09-04 03:11:58+00:00
02:41 logmsgbot: LocalisationUpdate completed (1.24wmf15) at 2014-09-04 02:40:45+00:00
01:08 mutante: production wants project name?
01:02 andrewbogott: the SAL still works, but the bot fails to acknowledge. Something to do with a change on wikitech
00:59 andrewbogott: testing the log
00:43 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 15s)

September 3

23:52 logmsgbot: reedy Synchronized php-1.24wmf15/includes/EditPage.php: (no message) (duration: 00m 14s)
23:43 logmsgbot: ori Synchronized docroot and w: (no message) (duration: 00m 05s)
23:28 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 14s)
23:04 logmsgbot: maxsem Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/157855/ https://gerrit.wikimedia.org/r/#/c/158265/ (duration: 00m 04s)
21:09 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: Disable GlobalUsage on labswiki (duration: 00m 15s)
20:59 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 15s)
20:47 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 15s)
20:45 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: touch (duration: 00m 15s)
20:38 logmsgbot: andrew Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 05s)
20:38 logmsgbot: andrew Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 04s)
20:37 logmsgbot: andrew Synchronized wmf-config/wikitech.php: (no message) (duration: 00m 04s)
20:16 subbu: deployed Parsoid version 78e55c6b (deploy repo sha c0761179)
18:52 logmsgbot: yurik Synchronized wmf-config: enabling graph ext on zerowiki & collabwiki (duration: 01m 06s)
18:51 MaxSem: Running sync-common on mw1163
18:48 logmsgbot: yurik Synchronized php-1.24wmf18/extensions/Graph/: (no message) (duration: 01m 09s)
18:47 logmsgbot: yurik Synchronized php-1.24wmf19/extensions/Graph/: (no message) (duration: 01m 05s)
16:52 logmsgbot: andrew Synchronized wmf-config/wikitech.php: (no message) (duration: 00m 03s)
16:52 logmsgbot: andrew Synchronized private/WikitechPrivateLdapSettings.php: (no message) (duration: 00m 03s)
16:51 logmsgbot: andrew Synchronized private/WikitechPrivateSettings.php: (no message) (duration: 00m 05s)
16:51 logmsgbot: andrew Synchronized wmf-config/wikitech.php: (no message) (duration: 00m 03s)
16:19 logmsgbot: andrew Synchronized wmf-config/wikitech.php: (no message) (duration: 00m 03s)
16:18 logmsgbot: andrew Synchronized wmf-config/wikitech.php: (no message) (duration: 00m 04s)
16:16 logmsgbot: andrew Synchronized wmf-config/wikitech.php: (no message) (duration: 00m 05s)
15:41 _joe_: mw1020 correctly reimaged, putting it in the hhvm pool
15:27 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: SWAT - Update another cirrus config - this time maybe it will work (duration: 00m 05s)
15:12 manybubbles: deployed throttling for Cirrus job named cirrusSearchLinksUpdate - it handles updating the index when a transcluded page changes - we'll have to check on the backlog over the next few hours/days to see if it stabilizes
15:11 logmsgbot: manybubbles Synchronized php-1.24wmf19/extensions/Wikidata/: (no message) (duration: 00m 07s)
15:07 manybubbles: mw1020 gets WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! during sync-dir call
15:07 logmsgbot: manybubbles Synchronized wmf-config/: SWAT deploy cirrus config changes - make sure to get mw1020 (duration: 00m 04s)
15:05 manybubbles: https://gerrit.wikimedia.org/r/#/c/157861/ didn't work as expected - dropped everything out of using the all field......
15:03 logmsgbot: manybubbles Synchronized wmf-config/: SWAT deploy cirrus config changes (duration: 00m 06s)
14:53 cmjohnson1: running sync-common on mw1178
14:52 cmjohnson1: adding mw1178 back to pybal
12:42 _joe_: typo: mw1020, not mw1120
12:41 _joe_: mw1120: remove from pybal, schedule downtime, reimage to HAT
11:23 godog: run gmond on elastic1002 manually to debug ES collector issues
11:17 godog: run gmond on elastic1001 manually to debug ES collector issues
07:55 _joe_: re-enabling mw1192, what we were seeing was probably load and not anything else
06:56 ori: restarted memcached on virt1000 due to cache pollution from migration (different memc drivers w/different encoding)
04:54 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Sep 3 04:53:50 UTC 2014 (duration 53m 49s)
03:51 logmsgbot: LocalisationUpdate completed (1.24wmf19) at 2014-09-03 03:50:17+00:00
03:17 logmsgbot: LocalisationUpdate completed (1.24wmf18) at 2014-09-03 03:16:37+00:00
02:43 logmsgbot: LocalisationUpdate completed (1.24wmf15) at 2014-09-03 02:42:24+00:00
00:18 mutante: deleted PDF files older than 3d and a huge 1G one on ocg1001 in reaction to monitoring complaints
00:00 logmsgbot: ori Synchronized php-1.24wmf18/extensions/WikimediaEvents: Update WikimediaEvents for cherry-picks (duration: 00m 03s)

September 2

23:49 logmsgbot: ori Synchronized php-1.24wmf19/extensions/WikimediaEvents: Update WikimediaEvents for cherry-picks (duration: 00m 03s)
23:32 logmsgbot: reedy Synchronized wmf-config/wikitech.php: (no message) (duration: 00m 13s)
23:22 logmsgbot: catrope Synchronized php-1.24wmf19/includes/OutputPage.php: 5094c0d9c (duration: 00m 05s)
23:14 Krinkle: Running extensions/GlobalCssJs/removeOldManualUserPages.php per m:GlobalCssJs
22:49 logmsgbot: reedy Synchronized wmf-config/wikitech.php: (no message) (duration: 00m 14s)
22:37 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 13s)
22:26 logmsgbot: reedy Synchronized wmf-config/wikitech.php: (no message) (duration: 00m 13s)
22:10 logmsgbot: reedy Synchronized docroot and w: (no message) (duration: 00m 14s)
21:57 logmsgbot: reedy Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 21s)
21:55 logmsgbot: reedy Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 24s)
21:51 logmsgbot: reedy Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 25s)
21:45 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 23s)
21:37 logmsgbot: reedy Synchronized wmf-config/db-eqiad.php: Wikitech db (duration: 00m 22s)
21:34 logmsgbot: bd808 Finished scap: no-op scap to build l10n for wikitech (duration: 55m 48s)
20:39 logmsgbot: bd808 Started scap: no-op scap to build l10n for wikitech
20:35 logmsgbot: bd808 Synchronized wmf-config/wikitech.php: eebc99a Require before instatiate (duration: 00m 04s)
20:31 logmsgbot: bd808 Synchronized private/PrivateSettings.php: Absolute path for WikitechPrivateSettings.php (duration: 00m 05s)
20:12 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: (no message)
20:05 MaxSem: Running cleanupPageProps.php everywhere
19:51 MaxSem: Running cleanupPageProps.php on mw.org and meta
19:14 logmsgbot: reedy Synchronized wmf-config/Wikibase.php: Bump epoch (duration: 00m 14s)
19:11 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non wikipedias to 1.24wmf19, added labswiki too
18:56 logmsgbot: bd808 Synchronized fishbowl.dblist: Add labswiki (wikitech) (duration: 00m 05s)
18:25 logmsgbot: andrew Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 08s)
17:49 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: touch (duration: 00m 14s)
17:39 logmsgbot: andrew Synchronized multiversion/MWMultiVersion.php: (no message) (duration: 00m 04s)
17:14 logmsgbot: andrew Finished scap: Deploying wikitech config (duration: 33m 03s)
17:01 bd808: Fetched f711ea7 to /a/common on tin; not syncing because of in-process scap.
16:41 logmsgbot: andrew Started scap: Deploying wikitech config
16:21 logmsgbot: andrew scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki="cawikibooks" --list-file="/a/common/wmf-config/extension-list" --output="/tmp/tmp.SCRILhxGxO" ' returned non-zero exit status 1 (duration: 01m 17s)
16:20 logmsgbot: andrew Started scap: Deploying wikitech config
16:17 ottomata: installing newer version of webstatscollector on oxygen and gadolinium, restarting filter process on oxygen
16:08 logmsgbot: andrew Synchronized /a/common/private/WikitechPrivateSettings.php: (no message) (duration: 00m 04s)
16:07 logmsgbot: andrew Synchronized /a/common/private/PrivateSettings.php: (no message) (duration: 00m 03s)
16:03 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: Commons gets Cirrus as primary (duration: 00m 04s)
15:44 godog: bring mw1114 -> mw1131 to weight 15
15:21 logmsgbot: marktraceur Synchronized wmf-config/: [SCAP] SpecialCite is now CiteThisPage (duration: 00m 07s)
15:17 logmsgbot: marktraceur Synchronized wmf-config/InitialiseSettings.php: [SCAP] Enable the TemplateData GUI editor on Norwegian Wikipedia (duration: 00m 07s)
15:14 logmsgbot: marktraceur updated /a/common to Ia1758b21e: depool db1035 for upgrade, move s3 vslow/dump to db1019
15:06 logmsgbot: marktraceur Synchronized php-1.24wmf19/includes/EditPage.php: [SCAP] Revert "Toolbar: Only show on WikiText pages" (duration: 00m 07s)
15:05 logmsgbot: marktraceur Synchronized php-1.24wmf18/includes/EditPage.php: [SCAP] Revert "Toolbar: Only show on WikiText pages" (duration: 00m 08s)
12:36 godog: increase weight to 15 for mw1132 -> mw1148
10:00 _joe_: depooling mw1192, high CPU temperatures; we may need to check fan status
07:20 _joe_: powercycling mw1192, blank console, unresponsive
07:02 springle: removed all-but-latest large slow logs on elastic1004 and elastic1014
06:22 springle: removed txt files filling up db1047 /tmp, looked like analytics SELECT INTO OUTFILE, dated mid-August
05:58 springle: dump s3 db1035 to db1069:3313
05:37 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1067, warm up (duration: 00m 08s)
04:59 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1067 (duration: 00m 07s)
03:35 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1035 (duration: 00m 07s)
03:16 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Sep 2 03:15:22 UTC 2014 (duration 15m 21s)
02:53 springle: restarted dbstore1002 mysqld for upgrade
02:26 logmsgbot: LocalisationUpdate completed (1.24wmf19) at 2014-09-02 02:25:36+00:00
02:15 logmsgbot: LocalisationUpdate completed (1.24wmf18) at 2014-09-02 02:14:30+00:00

September 1

23:00 Krinkle: Running extensions/GlobalCssJs/removeOldManualUserPages.php per m:GlobalCssJs
21:50 ori: disabled gerrit account Caothu9669; spam
19:12 Reedy: Deleted php-1.24wmf[6-8] from apaches via dsh
19:01 logmsgbot: reedy Purged l10n cache for 1.24wmf13
19:01 logmsgbot: reedy Purged l10n cache for 1.24wmf14
19:00 logmsgbot: reedy Purged l10n cache for 1.24wmf15
18:59 logmsgbot: reedy Purged l10n cache for 1.24wmf16
18:58 logmsgbot: reedy Purged l10n cache for 1.24wmf17
16:44 ottomata: removed some large slow query logs from elastic* nodes, need to look into this...
12:10 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1044, take 2 (duration: 00m 06s)
12:04 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1044 (duration: 00m 06s)
11:31 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: pool db1044, warm up (duration: 00m 06s)
09:31 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1027 (duration: 00m 07s)
07:27 godog: deploy latest ring to swift eqiad-prod
07:11 godog: powercycle ms-be1010 "cpu soft lockup" on console
05:28 springle: xtrabackup clone db1027 to db1044
05:26 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1027 while cloning (duration: 00m 07s)
03:15 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Sep 1 03:14:22 UTC 2014 (duration 14m 21s)
02:28 logmsgbot: LocalisationUpdate completed (1.24wmf19) at 2014-09-01 02:27:39+00:00
02:17 logmsgbot: LocalisationUpdate completed (1.24wmf18) at 2014-09-01 02:16:18+00:00

August 31

19:56 hashar: Jenkins updated HHVM to (3.3-dev+20140728+wmf5) over (3.3-dev+20140728+wmf4)
14:38 bblack: restarted apache on strontium
14:34 bblack: restarted apache on tungsten, machine is overloaded
03:15 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Aug 31 03:14:41 UTC 2014 (duration 14m 40s)
02:29 logmsgbot: LocalisationUpdate completed (1.24wmf19) at 2014-08-31 02:28:42+00:00
02:18 logmsgbot: LocalisationUpdate completed (1.24wmf18) at 2014-08-31 02:17:24+00:00
02:00 ori: Stopped HHVM jobrunner and disabled Puppet on mw1053 due to bug 70177.

August 30

08:18 godog: restart mailman on sodium, pending https://gerrit.wikimedia.org/r/#/c/156766/
08:01 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1071, warm up (duration: 00m 07s)
06:33 jgage: analytics1021 back in service after election
06:14 jgage: upgraded & rebooted analytics1021
06:01 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: return db1037 to normal load (duration: 00m 06s)
05:03 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool es1001 (duration: 00m 06s)
04:06 springle: upgrade es1001 to mariadb 10
03:56 springle: xtrabackup clone db1037 to db1071
03:55 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: reduce db1037 load while cloning (duration: 00m 06s)
03:31 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: pool db1073, depool db1071 (duration: 00m 07s)
03:19 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Aug 30 03:17:57 UTC 2014 (duration 17m 56s)
02:33 logmsgbot: LocalisationUpdate completed (1.24wmf19) at 2014-08-30 02:32:06+00:00
02:21 logmsgbot: LocalisationUpdate completed (1.24wmf18) at 2014-08-30 02:20:39+00:00

August 29

21:15 logmsgbot: ori Synchronized wmf-config: I812c0bb6c: Scrap unused Twemproxy config files (duration: 00m 04s)
21:12 logmsgbot: ori updated /a/common to I812c0bb6c: Scrap unused Twemproxy config files
21:03 mutante: restarted uwsgi on tungsten
20:49 mutante: tungsten extremely busy, graphite down, logging in since 5 minutes :p
20:48 mutante: powercycling ms-be1006 - BUG: soft lockup - CPU#0 stuck ...
19:19 mutante: installing package upgrades on iron, bast1001
18:41 logmsgbot: aaron Synchronized php-1.24wmf18/maintenance/findMissingFiles.php: 994d4a556a070156fd04fb4951492f10696cc63c (duration: 00m 03s)
18:30 logmsgbot: ori Synchronized php-1.24wmf19/resources/src/mediawiki.action/mediawiki.action.view.redirect.js: I19221a25a: mediawiki.action.view.redirect: Work around a IE 10+ HTML5 history API bug (duration: 00m 06s)
18:30 logmsgbot: ori Synchronized php-1.24wmf18/resources/src/mediawiki.action/mediawiki.action.view.redirect.js: I19221a25a: mediawiki.action.view.redirect: Work around a IE 10+ HTML5 history API bug (duration: 00m 07s)
15:36 hashar_: Jenkins: pooled a new slave 10.68.16.162 as wikidata-jenkins3 on behalf of addshore / wmde
15:04 _joe_: shutting down mw1163, filled RT 8243 for repair.
14:54 _joe_: re-enabled mw1130
14:41 logmsgbot: aude Synchronized wmf-config/Wikibase.php: enable Wikibase badges css, follow up from last night deploy (duration: 00m 06s)
14:22 _joe_: syncing mw1130
14:06 _joe_: disable mw1130 from the api pool whil it gets resynced
12:30 Krinkle: Running extensions/GlobalCssJs/removeOldManualUserPages.php per m:GlobalCssJs
11:06 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1070 (duration: 00m 09s)
08:04 hashar: Jenkins: in the jenkins-job-builder-config branch 'cloudbees' has been merged in 'master'. Unifying CI and browser tests jobs! \O/
07:05 _joe_: re-enabling puppet on the jobrunner, to check if the luasandbox fix works
06:33 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: return db1056 to normal load (duration: 00m 06s)
04:14 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Aug 29 04:13:03 UTC 2014 (duration 13m 2s)
03:31 springle: xtrabackup clone db1056 to db1070
03:31 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: reduce db1056 load while cloning (duration: 00m 06s)
03:15 logmsgbot: LocalisationUpdate completed (1.24wmf19) at 2014-08-29 03:10:26+00:00
02:48 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1070. pool db1072. (duration: 00m 07s)
02:38 logmsgbot: LocalisationUpdate completed (1.24wmf18) at 2014-08-29 02:37:20+00:00
01:49 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1070. pool db1072. (duration: 00m 06s)
01:27 godog: repool ms-fe1002
01:06 cmjohnson1: shutting down ms-fe1002 to relocate racks
01:04 godog: depool ms-fe1002
01:02 godog: repool ms-fe1001
00:57 logmsgbot: ori Synchronized php-1.24wmf18/extensions/WikimediaEvents: Ib44fe0898: Inject 'wgPoweredByHHVM' JS config var if powered by HHVM (duration: 00m 03s)
00:56 logmsgbot: ori Synchronized php-1.24wmf19/extensions/WikimediaEvents: Ib44fe0898: Inject 'wgPoweredByHHVM' JS config var if powered by HHVM (duration: 00m 04s)
00:38 cmjohnson1: shutting down ms-fe1001 for rack relocation
00:34 godog: depool ms-fe1001
00:32 godog: repool ms-fe1004
00:27 mutante: restarting gmetad on nickel
00:04 cmjohnson1: shutting down ms-fe1004 to relocate racks

August 28

23:58 godog: depool ms-fe1004
23:51 godog: repooling ms-fe1003
23:40 logmsgbot: maxsem Synchronized php-1.24wmf19/maintenance/: https://gerrit.wikimedia.org/r/#/c/156979/ (duration: 00m 04s)
23:39 logmsgbot: maxsem Synchronized php-1.24wmf19/includes/: https://gerrit.wikimedia.org/r/#/c/156979/ (duration: 00m 06s)
23:38 logmsgbot: maxsem Synchronized php-1.24wmf19/extensions/Flow/: https://gerrit.wikimedia.org/r/#/c/156994/ (duration: 00m 05s)
23:36 logmsgbot: maxsem Synchronized php-1.24wmf19/extensions/Echo: https://gerrit.wikimedia.org/r/#/c/157008/ (duration: 00m 04s)
23:35 logmsgbot: maxsem Synchronized php-1.24wmf19/extensions/Thanks/: https://gerrit.wikimedia.org/r/#/c/156898/ (duration: 00m 04s)
23:34 logmsgbot: maxsem Synchronized php-1.24wmf19/extensions/MobileFrontend/: https://gerrit.wikimedia.org/r/#/c/156968/ (duration: 00m 05s)
23:30 logmsgbot: maxsem Synchronized php-1.24wmf18/extensions/GlobalCssJs/: https://gerrit.wikimedia.org/r/#/c/157009/ (duration: 00m 04s)
23:27 K4-713: Updated fraud filters on payments
22:52 logmsgbot: aaron Synchronized php-1.24wmf18/maintenance/findMissingFiles.php: (no message) (duration: 00m 07s)
22:15 mutante: restarted tools.morebots production instance - can i log now?
22:13 cmjohnson1: ms-fe1003 down for relocation
22:13 mutante: test
21:15 robh: bast2001.wikimedia.org now online in codfw.
21:15 robh: i never admin logged when install2001.wikimedia.org went online the other day, opps.
21:15 ori: last sync was of Iac37a2369: resourceloader: Don't register raw modules client-side
21:14 logmsgbot: ori Synchronized php-1.24wmf18/includes/resourceloader/ResourceLoaderStartUpModule.php: (no message) (duration: 00m 03s)
20:57 logmsgbot: krinkle Synchronized php-1.24wmf19/includes/resourceloader/ResourceLoaderStartUpModule.php: fd5b963458c19 (duration: 00m 06s)
20:33 ottomata: shutting down elastic1016
20:16 ottomata: temporarily disable puppet on gadolinium
19:36 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 14s)
19:12 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.24wmf19
19:09 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikipedias to 1.24wmf18
19:07 logmsgbot: reedy Finished scap: testwiki to 1.24wmf19 (duration: 43m 00s)
18:25 godog: install build-essential and fakeroot on tin
18:24 logmsgbot: reedy Started scap: testwiki to 1.24wmf19
17:26 logmsgbot: aaron Synchronized rpc: 9564e93ecd4953126d91b99d7728f63401a4dc86 (duration: 00m 07s)
17:13 ^d: elastic: excluded the elastic1016 node from shard allocation, shards draining so we can take it down for disk testing
16:01 ottomata: restarted webstats-collector on gadolinium
13:18 mark: Reactivated cr2-eqiad AS3257 transit link
10:44 springle: xtrabackup clone db1051 to db1073
10:18 godog: restarting mailman on sodium
08:52 godog: restarted apache on mw1134
08:03 godog: killed stray mailman processes on sodium (no pid file) and restarted mailman
06:11 springle: xtrabackup clone db1051 to db1072
06:09 springle: restarted morebots

August 26

21:04 hashar: Updating our Jenkins Job Builder fork 0268581..e5c0c61 . Will let us define variables in 'default' section and override them when invoking a job template ( https://review.openstack.org/#/c/100020/ )
19:58 bd808: Ran sync-common on mw1053.eqiad.wmnet to recover from failure during last scap
19:48 logmsgbot: aude Finished scap: Update new messages for Wikibase (duration: 07m 16s)
19:41 logmsgbot: aude Started scap: Update new messages for Wikibase
19:39 logmsgbot: aude Synchronized wmf-config/Wikibase.php: add Wikibase badges css setting (duration: 00m 10s)
19:25 logmsgbot: aude Synchronized wmf-config/Wikibase.php: enable new serialization format for wikidata (duration: 00m 08s)
19:10 logmsgbot: reedy Synchronized php-1.24wmf18/extensions/Echo/: (no message) (duration: 00m 14s)
19:05 logmsgbot: aude Synchronized wmf-config/Wikibase.php: enable otherprojects sidebar beta feature (duration: 00m 15s)
18:54 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non wikipedias to 1.24wmf18
18:53 logmsgbot: reedy Synchronized php-1.24wmf18/extensions/MassMessage: (no message) (duration: 00m 14s)
18:52 logmsgbot: reedy Synchronized php-1.24wmf17/extensions/MassMessage: (no message) (duration: 00m 16s)
18:19 jgage: Failover from analytics1010-eqiad-wmnet to analytics1004-eqiad-wmnet successful
17:47 logmsgbot: bd808 Synchronized private/PrivateSettings.php: Syncing file rather than symlink (duration: 00m 04s)
17:36 bd808: mw1010.eqiad.wmnet was out of sync too. I suspect there is something wrong with the fanout update step in scap
17:26 bd808: /usr/local/apache/common-local out of date on mw1161.eqiad.wmnet; updated via sync-common
17:25 bd808: sync-* not updating terbium properly; sync-common from terbium manually got several config changes; maybe a problem with mw1161.eqiad.wmnet rsync mirror
17:14 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: touch (duration: 00m 04s)
17:11 logmsgbot: demon Synchronized wmf-config/PrivateSettings.php: adjust swift auth url for cirrus (duration: 00m 04s)
17:05 cmjohnson: swapping failed disk labsdb1003 slot 1
16:42 bd808: Ran sync-common on osmium to verify that it now rebuilds l10n cache by default (and it does!)
16:36 legoktm: running removeOldManualUserPages.php (GlobalCssJs) for users who requested it
16:29 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: Again, with feeling (duration: 00m 04s)
16:26 logmsgbot: bd808 Finished scap: no-op scap to test scap code update (duration: 13m 31s)
16:20 bd808|DEPLOY: Rsync sloooow to fenari "16:18:52 fenari INFO - Finished rsync common (duration: 04m 38s)"
16:12 logmsgbot: bd808 Started scap: no-op scap to test scap code update
16:07 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 04s)
16:07 bd808|DEPLOY: Updated scap to 116027f (Make sync-common update l10n cdb files by default)
15:05 logmsgbot: anomie Synchronized wmf-config: SWAT: Enable GlobalCssJs on all CentralAuth wikis minus loginwiki gerrit:154432 (duration: 00m 09s)
13:32 hashar: Jenkins mediawiki-core-qunit job has been switched to Zuul cloner and pass! :-D
13:29 _joe_: re-enabling puppet, change aborted as not all sites are served via hhvm on the hhvm appservers (true story). Will re-do once all configs are in their place
13:12 _joe_: disabling puppet on all appservers while deploying an apache change
12:48 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: db1054 to normal load (duration: 00m 06s)
12:33 hashar: Jenkins reverted mediawiki-core-qunit to use Zuul cloner 156268. Gotta play with it on a new job name since it does not work out of the box as expected.
12:12 hashar: Jenkins migrating mediawiki-core-qunit to use Zuul cloner 156268
12:03 akosiaris: disable puppet on labsdb1006 for planet osm import
11:53 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: pool db1054, warm up (duration: 00m 08s)
09:04 godog: reboot ms-be1011, unresponse on network and console
08:28 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1036 (duration: 00m 06s)
05:41 springle: xtrabackup clone db1036 to db1054
05:39 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1036 while cloning (duration: 00m 06s)
05:28 springle: upgrade & restart db1054, fs check
04:48 logmsgbot: demon Synchronized wmf-config/InitialiseSettings-labs.php: (no message) (duration: 00m 06s)
04:27 springle: labsdb1002 back up
04:07 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Aug 26 04:06:34 UTC 2014 (duration 6m 33s)
03:23 ^d: restarting elasticsearch on elastic1001, elastic1003 and elastic1008. icinga may complain briefly.
03:11 springle: filesystem issues on labsdb1002. stopped mysqld
03:05 logmsgbot: LocalisationUpdate completed (1.24wmf18) at 2014-08-26 03:04:18+00:00
02:34 logmsgbot: LocalisationUpdate completed (1.24wmf17) at 2014-08-26 02:33:00+00:00

August 25

23:58 logmsgbot: ori Synchronized wmf-config/InitialiseSettings.php: Add 'movefile' to 'eliminator' user group on jawiki (duration: 00m 03s)
23:53 logmsgbot: maxsem Finished scap: SWAT: CentralNotice update (duration: 29m 58s)
23:23 logmsgbot: maxsem Started scap: SWAT: CentralNotice update
23:19 logmsgbot: maxsem Synchronized php-1.24wmf17/extensions/CentralNotice/: https://gerrit.wikimedia.org/r/#/c/156188/ (duration: 00m 05s)
23:17 logmsgbot: maxsem Synchronized php-1.24wmf18/extensions/CentralNotice/: https://gerrit.wikimedia.org/r/#/c/156188/ (duration: 00m 04s)
23:15 logmsgbot: maxsem Synchronized php-1.24wmf18/includes/htmlform/HTMLCheckField.php: https://gerrit.wikimedia.org/r/#/c/156015/ (duration: 00m 05s)
20:06 subbu: deployed parsoid version 5b5a5ed5
17:24 godog: reboot ms-be1004 to pick up kernel upgrade
17:13 godog: rebooting ms-be1002 to pick up updated kernel
16:54 ottomata: stopping puppet on cp3021. Testing an increase of http://kafka.queue.buffering.max.ms/ in order to avoid dropping messages during broker metadata change (e.g. leader elections)
16:48 hashar: Jenkins pooled in a new slave wdjenkins-node1 that will be used to run Wikidata jenkins jobs. Work in progress with addshore. It is not running jobs yet.
16:47 godog: reboot ms-be1011, xfsaild errors in dmesg
16:25 hashar: Jenkins: disconnecting and reconnecting Gearman plugin from https://integration.wikimedia.org/ci/configure
16:06 andrewbogott: wikitech deployment finished. Note that the OpenStackManager submodule is off of the MediaWiki branch because… the whole submodule setup there is a bit broken on account of a git bug that uses absolute paths to manage submodules.
16:01 andrewbogott: deploying tiny OpenStackManager upgrade on wikitech
15:58 ottomata: enabled elasticsearch shard allocation row awareness (via rest api)
12:45 hashar: hard stopped/restarted Zuul (workflow config error)
12:27 hashar: restarting zuul
10:15 mark: setup cross-confederation BGP sessions from AS65001 (eqiad) to AS65002 (codfw)
05:35 logmsgbot: hoo Synchronized wmf-config/InitialiseSettings.php: 156076 - Remove centralnotice-admin right assignments on 3 wikis - Basically a noop (duration: 00m 06s)
03:15 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Aug 25 03:14:13 UTC 2014 (duration 14m 12s)
02:27 logmsgbot: LocalisationUpdate completed (1.24wmf18) at 2014-08-25 02:25:58+00:00
02:15 logmsgbot: LocalisationUpdate completed (1.24wmf17) at 2014-08-25 02:14:14+00:00

August 24

23:17 ^d: slow indexing log going pretty bonanzas on elastic101[35]. Probably others too? Filling /var/log.
12:02 mark: Removed IPv6 subnet 2620:0:860:2::/64 from cr2-pmtpa:irb.101
03:18 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Aug 24 03:17:46 UTC 2014 (duration 17m 45s)
02:32 logmsgbot: LocalisationUpdate completed (1.24wmf18) at 2014-08-24 02:31:13+00:00
02:19 logmsgbot: LocalisationUpdate completed (1.24wmf17) at 2014-08-24 02:18:52+00:00

August 23

11:33 mark: Manually removed IPv6 addresses from fenari
11:23 mark: Deactivated IPv6 router-advertisement on cr2-pmtpa
11:21 mark: Manually removed IPv6 address from mchenry
10:36 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1004. pool db1053. (duration: 00m 07s)
03:07 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Aug 23 03:06:13 UTC 2014 (duration 6m 12s)
02:23 logmsgbot: LocalisationUpdate completed (1.24wmf18) at 2014-08-23 02:21:59+00:00
02:18 logmsgbot: LocalisationUpdate completed (1.24wmf17) at 2014-08-23 02:17:28+00:00
01:31 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1056 (duration: 00m 06s)
00:29 ori: disabled puppet on osmium again to debug a leak; please don't re-enable

August 22

18:10 logmsgbot: ori updated /a/common to I338d72a47: Do not define MEDIAWIKI before loading WebStart.php
17:43 ottomata: moving sqstat udp2log filter from analytics1003 to analytics1026, reqstats might blip for a sec...
17:41 ori: nuking /srv/deployment/rcstream on rcs1002 to verify trebuchet package provider reprovisions it
15:54 springle: xtrabackup db1056 to db1053
15:53 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1056 while cloning (duration: 00m 07s)
15:33 ^d: elastic1008: fixed /etc/hosts to point to actual IP instead of loopback
15:18 springle: upgrade & restart db1053, fs check
15:08 bd808: Still no apache2.log on fluorine or in logstash. Log seems to be available on fenari.
14:51 springle: switched s1 sanitarium and labsdb replication to db1069:3311 mariadb 10
14:39 mark: Removed IPv6 subnets 2620:0:860:1::/64 (squid subnet) and 2620:0:860:3::/64 (sandbox subnet) from cr2-pmtpa configuration
04:11 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Aug 22 04:10:47 UTC 2014 (duration 10m 46s)
03:19 logmsgbot: LocalisationUpdate completed (1.24wmf18) at 2014-08-22 03:18:44+00:00
02:33 logmsgbot: LocalisationUpdate completed (1.24wmf17) at 2014-08-22 02:32:11+00:00
00:03 logmsgbot: ori Finished scap: SWAT: d3de89777, 7abfe0d5e7, 8ec9853c32b, 476e9e90bd01 (duration: 06m 29s)

August 21

23:57 logmsgbot: ori Started scap: SWAT: d3de89777, 7abfe0d5e7, 8ec9853c32b, 476e9e90bd01
21:58 logmsgbot: ori Synchronized php-1.24wmf17/resources/src/mediawiki/mediawiki.js: I8d27442d1: Workaround for bug introduced by Icf6ede09b (duration: 00m 03s)
21:57 manybubbles: performing elasticsearch upgrade on elastic1015
21:02 logmsgbot: ori Synchronized php-1.24wmf17/resources/src/mediawiki/mediawiki.util.js: Touch resources/src/mediawiki/mediawiki.util.js (duration: 00m 06s)
20:44 godog: rolling restart of swift-proxy on ms-fe1*
20:11 godog: restarted swift-proxy on ms-fe1001
19:55 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 13s)
19:49 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.24wmf18
19:46 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikipedias to 1.24wmf17
19:44 logmsgbot: reedy Finished scap: testwiki to 1.24wmf18 (duration: 34m 01s)
19:31 mutante: disabled mw1178 in pybal
19:27 godog: restarted memcached on ms-fe1004
19:23 reedy|webirc: mw1178 returned [255]: ssh: connect to host mw1178 port 22: Connection timed out
19:23 reedy|webirc: mw1019 returned [127]: bash: sync-common: command not found
19:09 logmsgbot: reedy Started scap: testwiki to 1.24wmf18
18:28 manybubbles: *victim*
18:27 manybubbles: trying to recover from weird Elasticsearch upgrade failure by redoing the upgrade on one node while also blowing away the data directory during the upgrade. elastic1005, you are my first victem.
17:28 cmjohnson1: removing mw1130 from pybal
14:53 hashar: Jenkins: updated PHP CodeSniffer MediaWiki standard on all slaves.
14:36 hashar_: Jenkins: updating mediawiki code sniffer repo bf82117..bc4e590
10:02 hashar: Jenkins installed plugin Throttle Concurrent Builds.
03:21 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Aug 21 03:20:47 UTC 2014 (duration 20m 46s)
02:36 logmsgbot: LocalisationUpdate completed (1.24wmf17) at 2014-08-21 02:34:56+00:00
02:20 logmsgbot: LocalisationUpdate completed (1.24wmf16) at 2014-08-21 02:19:26+00:00
00:08 MatmaRex: (manybubbles contd.) …a single node going down but I expect the cluster to stay "yellow" during the process- no alerts.
00:07 manybubbles: bd808 needs to plan a logstash upgrade soon - let it be logged
00:05 manybubbles: if anyone is reading the SAL for fun or sees an error in Elasticsearch cluster in the next 24 hours - we're performing an elasticsearch upgrade. We've set it up this time so its super slow and boring. So boring I'm going to sleep through it. If you see more then transient complaining from icinga about elasticsearch you can call me/have someone with access to the contact list call me. I expect icinga to complain about a
00:00 manybubbles: unattended rolling restart of Elasticsearch cluster is going just fine - adding the 30 minute sleep between servers and turning down the replication rate makes it pretty boring.

August 20

23:07 awight: stopping the Thank You job
22:50 ori: disabled puppet on osmium to debug memory leak
21:46 logmsgbot: marktraceur Synchronized php-1.24wmf17/extensions/MultimediaViewer/: Add disable-by-default option to MultimediaViewer (duration: 00m 07s)
21:09 logmsgbot: marktraceur Synchronized wmf-config: Turn off Media Viewer for logged-in users at Commons. (duration: 00m 07s)
21:06 logmsgbot: marktraceur updated /a/common to I226bd1468: Add item-redirect to OAuth permissions
19:50 hashar: Restarting Zuul to prettify build results bug 66095
19:48 logmsgbot: awight Synchronized php-1.24wmf17/extensions/CentralNotice: push CentralNotice updates, including new hide cookie format (duration: 00m 05s)
19:47 logmsgbot: awight Synchronized php-1.24wmf16/extensions/CentralNotice: push CentralNotice updates, including new hide cookie format (duration: 00m 04s)
19:46 logmsgbot: awight Synchronized php-1.24wmf16/extensions/CentralNotice: push CentralNotice updates, including new hide cookie format (duration: 00m 07s)
16:11 manybubbles: elastic1001 upgrade went well - upgrading elastic1002 now
15:48 hashar: dns: Jenkins will now complain whenever you attempt to send tabs in any file of operations/dns.git bug 69478
15:17 manybubbles: manually lowered elasticsearch recovery speeds to stem off high load caused by healing the restart of elastic1001 - we were slowing down enough that we were filling the pool counter
15:05 logmsgbot: anomie Synchronized wmf-config/CommonSettings.php: SWAT: Add item-redirect to OAuth permissions gerrit:155257 (duration: 00m 09s)
15:01 logmsgbot: anomie Synchronized php-1.24wmf17/extensions/Wikidata/extensions/Wikibase/lib/: SWAT: Touch files on advice of Wikidata folks (duration: 00m 09s)
15:01 logmsgbot: anomie Synchronized wmf-config/Wikibase.php: SWAT: Fix config for specialSiteLinkGroups in Wikibase gerrit:155218 (duration: 00m 09s)
14:49 manybubbles: installing elasticsearch 1.3.2 on elasticsearch1001 only right now as a test
14:47 manybubbles: upgrading elasticsearch plugins on all elasticsearch servers in preparation to upgrade to elasticsearch 1.3 - if we roll back we'll have to redeploy the plugins
14:10 ottomata: changing group ownership and permissions on raw webrequest data in hdfs. Users now must be in the analytics-privatedata-users group to access.
13:47 manybubbles: experimenting with lowering merge factor on enwiki's Cirrus index - should improve query performance at the cost of more background tasks in the Elasticserach cluster
13:36 ottomata: disabling puppet on analytics1027 temporarily
13:10 godog: reboot ms-be1003, xfs errors/panics
12:03 logmsgbot: ori updated /a/common to Ic3fe1ef83: Update all symlinks to /apache
11:36 hashar: Updating Jenkins Job Builder fork 666e953..0268581
11:06 hashar_: mw1019 is missing sync-common causing sync issues.
11:06 logmsgbot: hashar Synchronized wmf-config/InitialiseSettings.php: new domain www.veikkos-archiv.com to wgCopyUploadsDomains 155239 bug 69777 (duration: 00m 03s)
11:05 logmsgbot: hashar Synchronized wmf-config/InitialiseSettings.php: new domain www.veikkos-archiv.com to wgCopyUploadsDomains 155239 bug 69777 (duration: 00m 03s)
10:33 logmsgbot: ori Synchronized w/touch.php: Ic9d8837b1: Canonicalize some remaining references to /apache symlink (duration: 00m 05s)
10:33 logmsgbot: ori Synchronized w/mobilelanding.php: Ic9d8837b1: Canonicalize some remaining references to /apache symlink (duration: 00m 05s)
10:26 logmsgbot: ori updated /a/common to Ic9d8837b1: Canonicalize some remaining references to /apache symlink
10:16 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: Id2d5cfa4c: Canonicalize path to $wgSiteMatrixFile (duration: 00m 06s)
09:40 godog: uploaded hhvm_3.3-dev+20140728+wmf5 to carbon
09:27 hashar: restarted Jenkins Gearman plugin.
04:06 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Aug 20 04:05:41 UTC 2014 (duration 5m 40s)
03:12 logmsgbot: LocalisationUpdate completed (1.24wmf17) at 2014-08-20 03:11:08+00:00
02:40 logmsgbot: LocalisationUpdate completed (1.24wmf16) at 2014-08-20 02:39:40+00:00

August 19

23:17 logmsgbot: catrope Synchronized php-1.24wmf17/extensions/MobileFrontend: (no message) (duration: 00m 04s)
23:14 logmsgbot: catrope Synchronized wmf-config/InitialiseSettings.php: Set wmgWikibaseSiteGroup for wikinews (duration: 00m 05s)
22:59 logmsgbot: aude Synchronized php-1.24wmf17/extensions/Wikidata: Fix badges css on Wikidata (duration: 00m 11s)
22:30 logmsgbot: aude Finished scap: Update Wikidata, WikimediaMessages and ZeroBanner (duration: 22m 02s)
22:08 logmsgbot: aude Started scap: Update Wikidata, WikimediaMessages and ZeroBanner
22:03 logmsgbot: aude Synchronized php-1.24wmf17/extensions/ZeroBanner: Update, per yurik (duration: 00m 18s)
21:22 logmsgbot: aude Synchronized wikidataclient.dblist: Enable Wikibase on Wikinews (duration: 00m 08s)
21:21 logmsgbot: aude Synchronized wmf-config: Config changes to enable Wikibase on Wikinews (duration: 00m 14s)
21:12 aude: added and populated sites table for wikinews
21:05 logmsgbot: aude Synchronized php-1.24wmf17/extensions/Wikidata: Fix badges css and populateSitesTable script in Wikibase (duration: 00m 14s)
20:26 RoanKattouw: Restarting Jenkins, it seems to be stuck
19:58 logmsgbot: aude Synchronized wmf-config/InitialiseSettings.php: enabling Wikidata to also be a client, e.g. use lua (duration: 00m 09s)
19:58 logmsgbot: aude Synchronized wmf-config/Wikibase.php: enabling Wikidata to also be a client, e.g. use lua (duration: 00m 12s)
19:53 logmsgbot: aude Synchronized php-1.24wmf17/extensions/Wikidata/extensions/Wikibase/lib/resources/: (no message) (duration: 00m 11s)
19:48 logmsgbot: aude Synchronized php-1.24wmf17/extensions/Wikidata/extensions/Wikibase/lib/includes/modules/SitesModule.php: (no message) (duration: 00m 09s)
19:48 logmsgbot: aude Synchronized wmf-config/Wikibase.php: fix config for special site links on Wikidata (duration: 00m 11s)
19:37 logmsgbot: aude Synchronized php-1.24wmf17/extensions/Wikidata/extensions/Wikibase/lib/includes/modules/SitesModule.php: (no message) (duration: 00m 11s)
19:26 logmsgbot: aude Synchronized wmf-config/Wikibase.php: allow adding site links to Wikidata (non-entity) pages on Wikidata (duration: 00m 08s)
19:21 logmsgbot: aude Synchronized wmf-config/Wikibase.php: enable item redirects on Wikidata (duration: 00m 08s)
19:16 logmsgbot: aude Synchronized wmf-config/Wikibase.php: enable badges on Wikidata (duration: 00m 08s)
19:06 logmsgbot: demon rebuilt wikiversions.cdb and synchronized wikiversions files: group1 back to wmf17
19:05 logmsgbot: demon Synchronized php-1.24wmf17/extensions/Wikidata/extensions/Wikibase/lib/includes/changes/EntityChange.php: (no message) (duration: 00m 05s)
18:33 logmsgbot: demon rebuilt wikiversions.cdb and synchronized wikiversions files: all group1 back to wmf16 until WB patch comes
18:22 andrewbogott: added virt1009 to the eqiad virt cluster
18:17 logmsgbot: demon rebuilt wikiversions.cdb and synchronized wikiversions files: wikidatawiki back to wmf16
18:13 logmsgbot: demon rebuilt wikiversions.cdb and synchronized wikiversions files: group1 to wmf17
15:17 logmsgbot: anomie Synchronized wmf-config/InitialiseSettings.php: SWAT: CopyUploadDomains for Commons gerrit:154718 (duration: 00m 12s)
15:15 logmsgbot: anomie Synchronized commonsuploads.dblist: SWAT: Remove emlwiki from commonsuploads.dblist gerrit:154714 (duration: 00m 09s)
15:13 logmsgbot: anomie Synchronized wmf-config/InitialiseSettings.php: SWAT: Add Affiliate namespace on chapcomwiki gerrit:154713 (for real this time) (duration: 00m 09s)
15:12 mark: Completed network migration of BGP confideration renumbering: AS65002 -> AS65001, AS65003 -> AS65004, old AS65001 (pmtpa) is part of eqiad for its remaining lifetime
15:12 logmsgbot: anomie Synchronized wmf-config/InitialiseSettings.php: SWAT: Add Affiliate namespace on chapcomwiki gerrit:154713 (duration: 00m 09s)
15:10 logmsgbot: anomie Synchronized php-1.24wmf16/includes/parser/Parser.php: SWAT: Fix URL protocol detection regex for file link= parameter gerrit:154844 (duration: 00m 09s)
15:04 logmsgbot: anomie Synchronized php-1.24wmf17/includes/parser/Parser.php: SWAT: Fix URL protocol detection regex for file link= parameter gerrit:154845 (duration: 00m 09s)
14:50 ottomata: starting stat1003 upgrade to trusty
14:37 logmsgbot: demon updated /a/common to I035cebe20: Configure swift-backed snapshots for Cirrus in beta
14:05 logmsgbot: demon Synchronized wmf-config/CirrusSearch-labs.php: beta swift config, no-op (duration: 00m 04s)
13:39 hashar_: Jenkins upgrading hhvm on the Trusty Jenkins slave integration-slave1006-trusty : Unpacking hhvm (3.3-dev+20140728+wmf4) over (3.3-dev+20140728+wmf3)
13:12 logmsgbot: hashar Synchronized wmf-config/InitialiseSettings.php: Fix $wgRestrictionLevels ordering bug 69640 (duration: 00m 04s)
10:19 hashar: Jenkins: bringing back irc bot wmf-insecte in #wikimedia-qa . Will be used to notify failures/fixe of the beta cluster jenkins jobs
09:58 godog: depool mw1019 from appservers, testing trusty+hhvm reinstall RT #8153
07:39 bblack: strontium ok, icinga-wm back
07:17 hashar: Jenkins: manually cleared out a tmpfs partition on lanthanum.eqiad.wmnet which was causing all MediaWiki / extensions jobs to fail completely. bug 69731. We need disk space monitoring which is bug 69733.
07:09 bblack: ... and strontium passenger is failing to start up correctly again. icinga-wm disabled to avoid spam
07:07 bblack: restarted apache2 service on strontium/palladium, expect another small spike of puppet fail->ok
03:21 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Aug 19 03:20:21 UTC 2014 (duration 20m 20s)
02:37 logmsgbot: LocalisationUpdate completed (1.24wmf17) at 2014-08-19 02:36:21+00:00
02:16 logmsgbot: LocalisationUpdate completed (1.24wmf16) at 2014-08-19 02:15:14+00:00

August 18

23:03 andrewbogott: isolated virt1006, re-enabling puppet on virt1000 and virt1006
22:36 andrewbogott: disabling puppet on virt1000 and virt1006 while I try to convince the scheduler to overlook virt1006
22:01 bblack: done futzing w/ puppetmasters+neon, all agents enabled and bot back online
21:28 hashar: Zuul processing again. Definitely need to write doc about how to unstuck it
21:02 hashar: Zuul / Jenkins stalled again :-/
21:02 hashar: Zuul / Jenkins stalled again :-/
19:35 bblack: testing new passenger perf params on strontium/palladium. agents on those two and icinga-wm still disabled
19:04 bblack: restarted service apache2 on strontium - passenger for puppet master was dead again
17:00 andrewbogott: added a (yuvi-built) python-txstatsd package to trusty on Carbon.
16:37 bd808: deployment-prep Restarted Apache and HHVM on deployment-mediawiki02 to pick up removal of /etc/php5/conf.d/mail.ini
16:26 logmsgbot: yurik Synchronized php-1.24wmf17/extensions: Syncing JsonConfig,ZeroPortal,ZeroBanner (duration: 01m 13s)
16:22 logmsgbot: yurik Synchronized php-1.24wmf16/extensions: Syncing JsonConfig,ZeroPortal,ZeroBanner (duration: 01m 22s)
16:18 legoktm: migrateAccount.php finished, 2014-08-18 15:42:12 processed 1528652 usernames (22.9/sec), 10 (0.0%) fully migrated, 7938 (0.5%) partially migrated
16:05 hashar: Jenkins tox based jobs are now runnable in parallel 154834
15:36 manybubbles: swat complete
15:29 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: SWAT - enable cirrus optimization - weighted all fields - on group0 wikis (duration: 00m 07s)
15:29 logmsgbot: manybubbles Synchronized wmf-config/CirrusSearch-common.php: SWAT - drop unused Cirrus parameter (duration: 00m 05s)
15:25 logmsgbot: manybubbles Synchronized php-1.24wmf16/extensions/CentralAuth: SWAT - two centralauth fixes (duration: 00m 05s)
15:22 bblack: resuming slowly wiping varnish caches for mmap update (49 hosts to go), expect small 5xx spikes every ~1.5 hrs for the next few days
15:22 logmsgbot: manybubbles Synchronized wmf-config/: SWAT - noop - sync files adding bouncehandler to betalabs (duration: 00m 04s)
15:19 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: SWAT - create portal/portal talk namespaces on kowikisource (duration: 00m 04s)
15:18 logmsgbot: manybubbles Synchronized php-1.24wmf17/extensions/CentralAuth/: SWAT - two centralauth fixes (duration: 00m 04s)
15:13 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: SWAT - create eliminator role on viwiki (duration: 00m 05s)
15:11 logmsgbot: manybubbles Synchronized php-1.24wmf17/extensions/Wikidata/: (no message) (duration: 00m 07s)
15:08 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: SWAT - Add global-renamer group to metawiki (duration: 00m 04s)
14:50 hashar: Jenkins: reverting PHP CodeSniffer upgrade 154825.We are back to 1.4.7. Previous patch had some issue.
14:42 hashar: Jenkins: upgrading PHP Codesniffer from 1.4.7 to 1.4.8 (thanks to addshore 154053)
14:39 bd808: No apache2.log in fluorine:/a/mw-log; Last file in /a/mw-log/archive is apache2.log-20140816.gz
14:31 bd808: Restarted logstash on logstash1001; event volume was lower than expected
13:49 hashar: restarting zuul. Got stuck again.
13:29 hashar_: Restarted Zuul, some items where stuck in queue. Retrigger your jobs (revote +2 / new patchset / 'recheck' comment)
13:23 logmsgbot: reedy Synchronized php-1.24wmf17/extensions/ExtensionDistributor: Unbreak ExtensionDistributor (duration: 00m 13s)
13:18 hashar: Zuul stuck, looking.
13:06 Reedy: Large amount of incoming traffic to bast1001 is me uploading files
12:11 godog: rebalanced swift object ring in eqiad
09:34 godog: reenabled puppet on neon and started ircecho
09:23 godog: stop ircecho again on neon, disable puppet on neon
09:11 godog: restarted apache2 on strontium
08:58 godog: stopped ircecho on neon while diagnosing puppet failure
03:13 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Aug 18 03:12:27 UTC 2014 (duration 12m 26s)
03:06 hoo: Ran sync-common on mw1053 to stop "Unrecognized job type 'ChangeNotification'." exceptions
02:31 logmsgbot: LocalisationUpdate completed (1.24wmf17) at 2014-08-18 02:30:17+00:00
02:19 logmsgbot: LocalisationUpdate completed (1.24wmf16) at 2014-08-18 02:18:52+00:00

August 17

21:07 legoktm: running migrateAccount.php without --safe or --auto on terbium for bug 69291
18:45 hashar: Zuul upgraded
18:41 hashar: Upgrading Zuul to latest version (that is not a friday afterall)
09:22 springle: ongoing schema change wikidatawiki & testwikidatawiki wb_entity_per_page.epp_redirect_target. osc_host.sh processes on terbium ok to kill in emergency
04:34 ottomata: restarted udp2log on oxygen
03:05 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Aug 17 03:04:22 UTC 2014 (duration 4m 21s)
02:49 springle: killed stuff on labsdb1002 using all disk for temp tables. investigating
02:24 logmsgbot: LocalisationUpdate completed (1.24wmf17) at 2014-08-17 02:23:08+00:00
02:14 logmsgbot: LocalisationUpdate completed (1.24wmf16) at 2014-08-17 02:13:35+00:00

August 16

18:12 bblack: (amssq33: and yes, removing from fe/be cache pools)
18:11 bblack: powering off amssq33, it's clipping network traffic at peak times due to bad ethernet connection negotiated down to 100Mbps (see existing RT 7933 in esams queue)
18:02 bblack: ms-be1006: syslog indicates it started generating repeated "BUG: soft lockup" 10 minutes before dying, in XFS kernel code again...
17:55 bblack: rebooting ms-be1006, ping-dead in icinga for 23m, console was unresponsive
17:37 bblack: restarted apache2 on palladium... looks like something went horribly wrong with its puppet of itself that somehow killed off puppetmaster service?
03:07 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Aug 16 03:06:29 UTC 2014 (duration 6m 28s)
02:27 logmsgbot: LocalisationUpdate completed (1.24wmf17) at 2014-08-16 02:26:02+00:00
02:17 logmsgbot: LocalisationUpdate completed (1.24wmf16) at 2014-08-16 02:16:00+00:00

August 15

20:59 logmsgbot: kaldari Synchronized php-1.24wmf16/extensions/MobileFrontend/less: fixing iOS search bug (duration: 00m 05s)
17:58 logmsgbot: aude Synchronized wmf-config/Wikibase.php: Enable redirects on test.wikidata (duration: 00m 07s)
15:53 logmsgbot: aude Synchronized php-1.24wmf17/extensions/Wikidata: Update test.wikidata (duration: 00m 07s)
15:50 logmsgbot: aude Synchronized php-1.24wmf17/extensions/Wikidata: Fix database error and snak value display on test wikidata (duration: 00m 09s)
15:00 ori: re-enabled puppet on mw1017
13:33 ori: disabling puppet on mw1017 to test rsyslog config
03:51 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Aug 15 03:50:23 UTC 2014 (duration 50m 22s)
03:04 logmsgbot: LocalisationUpdate completed (1.24wmf17) at 2014-08-15 03:03:49+00:00
02:34 logmsgbot: LocalisationUpdate completed (1.24wmf16) at 2014-08-15 02:33:21+00:00
00:24 logmsgbot: ori Finished scap: SWAT: cherry picks for TMH and Echo (duration: 14m 38s)
00:09 logmsgbot: ori Started scap: SWAT: cherry picks for TMH and Echo

August 14

23:24 logmsgbot: aude Synchronized wmf-config/Wikibase.php: Bump cache epoch and add badges setting on test.wikidata (duration: 00m 32s)
23:13 logmsgbot: aude Finished scap: Update branch for test.wikidata (duration: 16m 48s)
22:57 logmsgbot: aude Started scap: Update branch for test.wikidata
22:26 logmsgbot: aaron Synchronized php-1.24wmf16/includes/DefaultSettings.php: 67bf481ce1644ff194d7565107d9b8ffe11bf4b7 (duration: 00m 07s)
22:23 logmsgbot: aaron Synchronized wmf-config/CommonSettings.php: Increased wgParsoidCacheUpdateTitlesPerJob to 12 to lower the backlog (duration: 00m 07s)
22:14 logmsgbot: aude scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki="test2wiki" --list-file="/a/common/wmf-config/extension-list" --output="/tmp/tmp.kFlVQdKnM2" ' returned non-zero exit status 255 (duration: 00m 40s)
22:13 logmsgbot: aude Started scap: Update branch for test.wikidata
21:49 logmsgbot: reedy Synchronized php-1.24wmf17/includes/context/RequestContext.php: (no message) (duration: 00m 15s)
21:10 godog: restarted hhvm on mw1053
20:47 _joe|away: stopping puppet, jobrunner on mw1053; HHVM is eating memory like godzilla
19:29 bblack: puppeting labmon1001, etc
18:57 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 14s)
18:55 logmsgbot: reedy Synchronized database lists: (no message) (duration: 00m 14s)
18:26 mutante: stopped ircecho on neon temporarily
18:10 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.24wmf17
18:05 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikipedias to 1.24wmf16
17:45 AaronSchulz: /srv/deployment/jobrunner updated to 795baf3ca4ce8308597dd74e5242aa5bfbbe961d
17:39 logmsgbot: aaron Synchronized rpc: 6c0ece687bb6ff3fec0ca7e80a587525ebf18a70 (duration: 00m 08s)
16:52 _joe_: uploaded new hhvm package 3.3-dev+20140728+wmf4
16:23 logmsgbot: reedy Synchronized php-1.24wmf17/extensions/CentralAuth/: (no message) (duration: 00m 13s)
16:23 logmsgbot: reedy Synchronized php-1.24wmf16/extensions/CentralAuth/: (no message) (duration: 00m 14s)
15:49 Reedy: Running sync-common on mw1053
15:48 logmsgbot: reedy Finished scap: testwiki to 1.24wmf17 (duration: 33m 13s)
15:47 Jeff_Green: adjust wiki-mail._domainkey DNS record to allow sending from 'wiki*@" addresses, instead of just wiki@
15:23 _joe_: powercycling mw1053, which looks like the victim of hhvm-induced ooms
15:15 logmsgbot: reedy Started scap: testwiki to 1.24wmf17
14:01 _joe_: puppet re-enabled on the appserver
12:38 _joe_: stopping puppet on appservers while deploying a delicate change.
12:12 manybubbles|away: cirrus index rebuilds are still proceeding without issue. Going to continue to let them run and keep half an eye on them. enwiki is nearly done. Commons and wikidata are done. Many of group1 are done - we're up to eswiktionary now - but there are many to go.
09:30 _joe_: the hhvm jobrunner is back in production, seems healthy, see https://logstash.wikimedia.org/#/dashboard/elasticsearch/hhvm_jobrunner
08:09 _joe_: reactivated the jobrunner on mw1053, with promising results. Puppettization pending (in ~ 1 hour)
03:12 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Aug 14 03:11:33 UTC 2014 (duration 11m 32s)
02:30 logmsgbot: LocalisationUpdate completed (1.24wmf16) at 2014-08-14 02:29:52+00:00
02:17 logmsgbot: LocalisationUpdate completed (1.24wmf15) at 2014-08-14 02:16:34+00:00

August 13

21:58 manybubbles: cirrus index rebuild is proceeding without trouble - I'm going to let it continue over night.
21:46 andrewbogott: re-enabled puppetmaster on virt1000; apache changes seem stable now.
21:18 _joe_: stopped puppet on virt1000, our fail
13:23 springle: killed a mass of SpecialWhatLinksHere queries on enwiki
12:51 manybubbles: restarting rebuilding Cirrus indexes to pick up weighted all field
10:35 godog: bump swift weights for ms-be1013 ms-be1014 ms-be1015 to 2500
08:38 hashar: gallium removing some sun-java6* packages coming from old lucid era
07:47 hashar: upgrading Java on contint servers gallium and lanthanum , restarting Jenkins related process
04:03 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Aug 13 04:02:23 UTC 2014 (duration 2m 22s)
03:12 logmsgbot: LocalisationUpdate completed (1.24wmf16) at 2014-08-13 03:11:38+00:00
02:41 logmsgbot: LocalisationUpdate completed (1.24wmf15) at 2014-08-13 02:40:36+00:00

August 12

23:34 logmsgbot: hoo Synchronized tests/multiversion/MWMultiVersionTest.php: (no message) (duration: 00m 11s)
23:32 logmsgbot: hoo Synchronized php-1.24wmf16/skins/Vector/skinStyles/mediawiki.special.preferences.less: Fix missing tab images on Special:Preferences (duration: 00m 10s)
23:26 hoo: Had to abort scap on mw1053 (which is depooled) manually
23:26 logmsgbot: hoo Finished scap: Update WikimediaMessages (superprotect messages for wmf16) (duration: 46m 16s)
22:40 logmsgbot: hoo Started scap: Update WikimediaMessages (superprotect messages for wmf16)
22:21 logmsgbot: hoo Synchronized php-1.24wmf16/extensions/ProofreadPage/: Fix JS error while editing (duration: 00m 10s)
19:12 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non wikipedias to 1.24wmf16
19:06 logmsgbot: reedy Synchronized php-1.24wmf16/includes/specials/SpecialRecentchangeslinked.php: Fix FR bug (duration: 00m 14s)
17:55 AaronSchulz: populateBacklinkNamespace.php finished on all wikis
17:13 springle: restart mysqld on labsdb1002, upgrade to mariadb 10.0.13 for bugfix
16:57 Jeff_Green: removed aluminium.wikimedia.org from production
16:50 springle: restart mysqld on labsdb1001, upgrade to mariadb 10.0.13 for bugfix
15:08 bblack: flipping ulsfo traffic back to ulsfo
11:51 logmsgbot: hoo Synchronized wmf-config/InitialiseSettings.php: Set siteGroup for testwikidata (duration: 00m 11s)
11:21 hashar: Jenkins: clearing up some obsolete symbolic links under gallium.wikimedia.org:/var/lib/jenkins/jobs/*/builds/ Running in a screen as user jenkins
05:01 springle: rsync ~1TB labsdb1001 to labsdb1003, throttled ~25MB/s
04:25 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: s2: repool db1009. s3: repool db1035. (duration: 00m 06s)
03:45 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: s2: depool db1009. repool db1018. adjust db1036 load. (duration: 00m 07s)
03:15 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Aug 12 03:14:34 UTC 2014 (duration 14m 33s)
02:33 logmsgbot: LocalisationUpdate completed (1.24wmf16) at 2014-08-12 02:32:09+00:00
02:19 logmsgbot: LocalisationUpdate completed (1.24wmf15) at 2014-08-12 02:18:36+00:00

August 11

22:33 awight: update CRM schema to wmf_civicrm:7021
21:47 andrewbogott: removed the old puppet-freshness check which should have no effect but may instead produce a torrent of alert spam https://gerrit.wikimedia.org/r/#/c/142560/
04:00 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Aug 11 03:59:17 UTC 2014 (duration 59m 16s)
03:05 logmsgbot: LocalisationUpdate completed (1.24wmf16) at 2014-08-11 03:04:15+00:00
02:34 logmsgbot: LocalisationUpdate completed (1.24wmf15) at 2014-08-11 02:33:02+00:00

August 10

23:47 logmsgbot: ori Synchronized php-1.24wmf16/extensions/MassMessage/includes: Revert MassMessage to 9884fbb50a (duration: 00m 06s)
23:36 logmsgbot: ori Synchronized php-1.24wmf16/extensions/MassMessage: Update MassMessage for I840c98dca: Fix MassMessage::getMessengerUser() after Password API changes (duration: 00m 06s)
22:59 logmsgbot: csteipp Finished scap: Deploy Ibe28a69c9fbab00b81c53b1643df722a3f1fbf19 at Eriks request (duration: 26m 13s)
22:33 logmsgbot: csteipp Started scap: Deploy Ibe28a69c9fbab00b81c53b1643df722a3f1fbf19 at Eriks request
16:25 logmsgbot: reedy Finished scap: Rebuild l10n cache for WikimediaMessages (duration: 22m 12s)
16:02 logmsgbot: reedy Started scap: Rebuild l10n cache for WikimediaMessages
15:01 logmsgbot: hoo Synchronized php-1.24wmf16/extensions/CentralAuth/: (no message) (duration: 00m 25s)
15:00 logmsgbot: hoo Synchronized php-1.24wmf15/extensions/CentralAuth/: (no message) (duration: 00m 25s)
13:53 Reedy: Grant staff "superprotect" right per Robla/Erik request
13:02 logmsgbot: tstarling Synchronized wmf-config/InitialiseSettings.php: Idfa21125 (duration: 00m 05s)
13:02 logmsgbot: tstarling Synchronized wmf-config/CommonSettings.php: Idfa21125 (duration: 00m 06s)
12:08 mutante: re-enabling puppet and services on tarin
11:57 mutante: tarin - stopping poolcounterd, gmond,.. (Tampa, should really not be in use)
03:15 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Aug 10 03:14:52 UTC 2014 (duration 14m 51s)
02:34 logmsgbot: LocalisationUpdate completed (1.24wmf16) at 2014-08-10 02:33:15+00:00
02:20 logmsgbot: LocalisationUpdate completed (1.24wmf15) at 2014-08-10 02:19:53+00:00

August 9

15:22 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: I313b09ffc: Don't require native CDB support to load {interwiki,trustedxff}.cdb (duration: 00m 05s)
14:25 Reedy: Removed <= MediaWiki 1.24wmf5
13:29 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: move s4 api traffic back to db1042 (duration: 00m 06s)
11:32 mutante: added Ryan Lane to NDA LDAP group
03:21 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Aug 9 03:20:07 UTC 2014 (duration 20m 6s)
02:37 logmsgbot: LocalisationUpdate completed (1.24wmf16) at 2014-08-09 02:36:52+00:00
02:20 logmsgbot: LocalisationUpdate completed (1.24wmf15) at 2014-08-09 02:19:03+00:00

August 8

21:24 Reedy: mw1130 seems to be dead (unresponsive to ping)
21:21 logmsgbot: reedy Synchronized wmf-config/interwiki.cdb: (no message) (duration: 01m 04s)
21:06 awight: deployed crm default/settings.php
17:07 mutante: jenkins/puppet-compiler - granting new LDAP group "nda" the same rights already given to matanya (and wmde even has more)
16:06 bblack: datacenter traffic mapping back to normal, varnish fix/wipe/restart/etc work on pause for the weekend in a stable state
16:02 andrewbogott: merging https://gerrit.wikimedia.org/r/#/c/150273/ which affects every puppet log everywhere...
14:22 mutante: RT - reverted permission change for access requests requestors per robh
13:50 mutante: RT - granted permission to show ticket summary for role requestor in queue access-requests
12:49 akosiaris: uploaded ruby-jsduck 5.3.4-1wmftrusty1 and ruby-rkelly-remix 0.0.6-1trusty1 on apt.wikimedia.org
12:33 ori: testwiki up, judgement poor
12:28 hashar: Jenkins: somehow the ArtifactDeployer plugin got upgraded on Aug 7th 20:57 UTC despite it being broken bug 69197. Attempting manual downgrade
12:13 hashar: reloading Jenkins
12:07 akosiaris: ifconfig br0 0.0.0.0 on platinum to get rid of the IP on that interface and have facter work more reliably. This does not matter right now as it is an evaluation machine but logging it for completeness
12:03 logmsgbot: ori rebuilt wikiversions.cdb and synchronized wikiversions files: (no message)
11:32 _joe_: rebooting mw1017
11:29 akosiaris: mw1130 has broken disk
11:09 ori: running rsync-common on mw1017
11:02 logmsgbot: hoo Synchronized php-1.24wmf16/extensions/CentralAuth/: Another shot towards bug 39996 (duration: 01m 04s)
11:01 logmsgbot: hoo Synchronized php-1.24wmf15/extensions/CentralAuth/: Another shot towards bug 39996 (duration: 01m 04s)
09:29 _joe_: reimaging mw1017 aka testwiki.
06:03 springle: ongoing schema changes: rev_content_model, rev_content_format. on terbium, osc_host.sh processes ok to kill in emergency
03:13 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Aug 8 03:12:21 UTC 2014 (duration 12m 20s)
02:29 logmsgbot: LocalisationUpdate completed (1.24wmf16) at 2014-08-08 02:28:39+00:00
02:17 logmsgbot: LocalisationUpdate completed (1.24wmf15) at 2014-08-08 02:16:13+00:00

August 7

19:19 jgage: rebooting analytics1021 for kernel upgrade
18:55 bblack: starting the process of fixing upload cache sizes, there will be periodic slim 5xx spikes...
16:31 Jeff_Green: temporarily disabling icinga notifications for ocg100[123] ocg service check
16:09 logmsgbot: krinkle Synchronized php-1.24wmf16/extensions/GlobalCssJs/GlobalCssJs.hooks.php: 4bbf4e0ed92f9a09 (duration: 00m 05s)
15:48 mutante: zirconium - attempt to fix apache site setup manually
15:46 logmsgbot: reedy Synchronized wmf-config/extension-list-labs: (no message) (duration: 00m 13s)
15:38 logmsgbot: reedy Synchronized php-1.24wmf16/maintenance/findMissingFiles.php: (no message) (duration: 00m 20s)
15:37 logmsgbot: reedy Synchronized php-1.24wmf15/maintenance/findMissingFiles.php: (no message) (duration: 00m 17s)
15:12 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 13s)
14:43 akosiaris: uploaded varnish_3.0.5plus~x-wm7trusty1 on apt.wikimedia.org (for usage in trusty labs machines, notably cxserver)
14:23 mutante: shutting down elastic1018
14:12 ^d: elastic1018: blacklisted from shard allocation since it's dead
14:05 mutante: depooled elastic1018 - service wasnt running and signs of broken hardware (SSD)
13:57 mark: Temporarily set max connections to swift from cp1049 backend varnish from 1000 to 2000
13:56 mutante: starting elasticsearch on elastic1018
12:23 hashar: Zuul upgraded labs branch to match production (i.e. have same version of Zuul cloner)
12:20 hashar: restarting Zuul
11:25 logmsgbot: hoo Synchronized wmf-config/InitialiseSettings.php: I53f76a35ac - No longer allow voyage 'crats to usermerge (duration: 00m 15s)
11:13 akosiaris: removed laner@wikimedia.org entirely. It pointed to rlane@wikimedia.org which no longer exists
11:11 akosiaris: removed rlane from root@wikimedia.org and usability@wikimedia.org
10:45 mutante: iron, bast1001 - installed package upgrades
09:13 hashar: Jenkins: polling a new Jenkins slave using Trusty integration-slave1006-trusty [10.68.17.223] with 4 CPU. Copy pasted from 1004-trusty
08:32 hashar: Jenkins: switching job from https://github.com/wmf-analytics/libcidr/ to https://gerrit.wikimedia.org/r/analytics/libcidr
07:44 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: move s4 api traffic to db1056 (duration: 00m 07s)
07:39 mark: Set OSPF metric 1000 on cr2-eqiad:xe-5/2/2 (GTT link)
05:39 springle: labsdb1002 restart
03:48 springle: labsdb1001 restart
03:09 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Aug 7 03:08:49 UTC 2014 (duration 8m 48s)
02:28 logmsgbot: LocalisationUpdate completed (1.24wmf16) at 2014-08-07 02:27:52+00:00
02:16 logmsgbot: LocalisationUpdate completed (1.24wmf15) at 2014-08-07 02:15:45+00:00

August 6

21:33 hashar: Jenkins: moved mediawiki-core-regression-hhvm-master to run on Trusty instance
20:26 hashar: Jenkins: downgraded ansicolor plugin from 0.4 to 0.3.1 Some colors.js function emits ANSI codes to reset the color which are not properly understood
20:06 hashar: I have broke Zuul/Jenkins :-]
18:53 hashar: Jenkins slow startup is bug 69197
18:50 hashar: restarting jenkins
18:49 hashar: Stopping Jenkins. Reverting upgrade of artifact deployer plugin
18:10 mutante: puppet-catalog-compiler says to "wait while Jenkins is getting ready to work"
17:20 hashar: Jenkins process jobs again, the UI will take a bunch of hours to load though due to some issue when initializing
17:14 hashar: killed Jenkins
17:12 _joe_: stopped the jobrunner on mw1053, was running in fcgi mode unpuppetized and with a broken vhost. Fixed it, it started spawning exceptions. DO NOT enable puppet again
17:02 ^d: jenkins restarted, was stuck
15:52 hashar: Restarted Zuul and Zuul-merger on gallium to tweak logging settings 152118
11:30 logmsgbot: hoo Synchronized wmf-config/CommonSettings.php: Grant 'centralauth-rename' to 'steward' (duration: 00m 24s)
11:26 logmsgbot: demon Synchronized wmf-config/abusefilter.php: (no message) (duration: 00m 19s)
10:10 hashar: Jenkins web interface is back up
09:54 logmsgbot: demon Synchronized wmf-config/abusefilter.php: abuse filter settings for fawiki (duration: 00m 21s)
07:33 hashar: restarting Jenkins. It apparently like to parse the whole history on reload, so aborting that.
07:13 hashar: Upgrading Jenkins plugin and restarting.
07:04 hashar: upgrading Jenkins to latest LTS
03:11 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Aug 6 03:10:06 UTC 2014 (duration 10m 5s)
02:30 logmsgbot: LocalisationUpdate completed (1.24wmf16) at 2014-08-06 02:29:00+00:00
02:17 logmsgbot: LocalisationUpdate completed (1.24wmf15) at 2014-08-06 02:16:38+00:00

August 5

15:07 logmsgbot: root gracefulled all apaches
15:03 logmsgbot: root gracefulled all apaches
12:30 hasharEat: Upgrading python-gear on gallium and restarting zuul and zuul-merger
12:26 akosiaris: uploaded python-gear_0.5.5-1 on apt.wikimedia.org
03:09 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Aug 5 03:08:30 UTC 2014 (duration 8m 29s)
02:29 logmsgbot: LocalisationUpdate completed (1.24wmf16) at 2014-08-05 02:27:58+00:00
02:17 logmsgbot: LocalisationUpdate completed (1.24wmf15) at 2014-08-05 02:16:52+00:00
00:41 springle: ongoing schema changes: ar_content_model, ar_content_format. on terbium, osc_host.sh processes ok to kill in emergency

August 4

22:58 bblack: rebooting ms-be1012
21:47 ottomata: reenabling puppet on analytics1027
21:46 jgage: all kafka brokers upraded to 0.8.1.1 and data replicated: done
20:37 ottomata: stopping puppet on analytics1027 to temporarily disable camus cron job
19:07 ottomata: starting upgrade of kafka cluster
19:02 logmsgbot: maxsem Synchronized php-1.24wmf16/includes/User.php: https://gerrit.wikimedia.org/r/#/c/151691/ (duration: 00m 06s)
18:57 jgage: beginning kafka upgrade: disabling puppet on brokers
13:17 apergos: stopped labs rsync job from dataset1001, mount of labstore1003 was borked, removed 90GB of stuff on /mnt/data (= /) filesystem, restarted nfsd on dataset1001, dumps back to going
03:12 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Aug 4 03:11:03 UTC 2014 (duration 11m 2s)
02:29 logmsgbot: LocalisationUpdate completed (1.24wmf16) at 2014-08-04 02:27:58+00:00
02:17 logmsgbot: LocalisationUpdate completed (1.24wmf15) at 2014-08-04 02:16:46+00:00

August 3

03:30 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Aug 3 03:28:56 UTC 2014 (duration 28m 55s)
02:28 logmsgbot: LocalisationUpdate completed (1.24wmf16) at 2014-08-03 02:27:44+00:00
02:17 logmsgbot: LocalisationUpdate completed (1.24wmf15) at 2014-08-03 02:16:39+00:00

August 2

15:28 godog: reboot ms-be1008, stuck on xfs errors and most processes in D state
14:10 Krinkle: Restarting Zuul
14:08 hashar: Jenkins / Zuul stuck bug 69045
14:00 Krinkle: Restarting Jenkins in attempt to unstuck the clogged Zuul pipeline for gallium
04:21 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Aug 2 04:20:45 UTC 2014 (duration 20m 44s)
02:33 logmsgbot: LocalisationUpdate completed (1.24wmf16) at 2014-08-02 02:32:36+00:00
02:21 logmsgbot: LocalisationUpdate completed (1.24wmf15) at 2014-08-02 02:20:02+00:00
01:49 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1018, replag (duration: 00m 06s)
00:43 Krinkle: Restarting Jenkins on gallium because the pipeline is clogged

August 1

20:25 andrewbogott: shorted the logrotate interval on vanadium; disk space critical should resolve soon
18:10 logmsgbot: csteipp Synchronized php-1.24wmf16/extensions/CentralAuth: Fix for bug 69007 - logins failing for old style hashes (duration: 00m 06s)
17:32 AaronSchulz: Restarted maintenance/populateBacklinkNamespace.php on enwiki
17:31 logmsgbot: aaron Synchronized php-1.24wmf15/maintenance/populateBacklinkNamespace.php: e1cea29342f964cd9a720310185b09ca41eb1a4a (duration: 00m 04s)
17:16 akosiaris: upgraded etherpad-lite on zirconium to 1.4.0-2. Uploaded etherpad-lite_1.4.0-2 on apt.wikimedia.org
17:11 logmsgbot: aaron Synchronized php-1.24wmf15/includes: d218d86dff90a5f0110353c492bd2e8ddaf35497 (duration: 00m 08s)
17:09 logmsgbot: aaron Synchronized php-1.24wmf16/includes: f1a8ff7f802b57cc9f452d47c4c762a185ed93c2 (duration: 00m 06s)
15:48 logmsgbot: reedy Synchronized php-1.24wmf16/includes/specials/SpecialRecentchangeslinked.php: (no message) (duration: 00m 14s)
12:07 apergos: powercycled dataset1001, inaccessible via mgmt console, only visible message was 'mnt.nfs failed'
09:10 _joe_: apache mediawiki::web train finished its run. re-enabling puppet on all appservers
07:47 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Aug 1 07:46:04 UTC 2014 (duration 46m 3s)
07:24 _joe_: stopping puppet on appservers to deploy a potentially dangerous case
05:16 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: Move enwiki api traffic away from lagging slaves (duration: 00m 07s)
03:12 logmsgbot: LocalisationUpdate completed (1.24wmf16) at 2014-08-01 03:11:14+00:00
02:40 logmsgbot: LocalisationUpdate completed (1.24wmf15) at 2014-08-01 02:38:56+00:00
00:52 logmsgbot: catrope Synchronized php-1.24wmf16/extensions/VisualEditor/lib/ve/modules/ve/ui/inspectors/ve.ui.CommentInspector.js: Fix typo in class name (duration: 00m 10s)

July 31

23:23 logmsgbot: mwalker Synchronized php-1.24wmf16: Updating core and Flow for SWAT (duration: 00m 53s)
23:05 logmsgbot: mwalker Synchronized wmf-config: Updating configuration for 150145 (duration: 00m 05s)
21:17 RobH: blog.wikimedia.org cname changed to migrate over to wp servers
20:22 AaronSchulz: Started populateBacklinkNamespace.php on s1-s3,s5-s7 (commons already running)
20:13 cscott: updated OCG to version d2919c59eb09e09fc87777696411a070620aef45
19:40 hashar: Jenkins build its first hhvm extension \O/ https://integration.wikimedia.org/ci/job/php-FastStringSearch-hhvm-build/2/console
19:24 Coren_away: labsdb1005 had to blow away the postgres slave: was using all the space on / because DB at wrong spot (should have been /srv/postgres)
18:40 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 15s)
18:27 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 15s)
18:09 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.24wmf16
18:02 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias to 1.24wmf15
17:47 logmsgbot: aaron Synchronized wmf-config/CommonSettings.php: Increased "htmlCacheUpdate" throttle limit (duration: 00m 07s)
17:46 logmsgbot: reedy Finished scap: testwiki to 1.24wmf16 and build l10n cache (duration: 22m 35s)
17:23 logmsgbot: reedy Started scap: testwiki to 1.24wmf16 and build l10n cache
14:57 bblack: added labstore1003 to filter labs-in4 terms allow-labstore-(udp|tcp)4 on cr[12]-eqiad
14:33 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: Allow sysops and 'crats on wikimania2014wiki to grant confirmed (duration: 00m 15s)
14:12 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikivoyages to 1.24wmf15
14:12 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 14s)
14:10 logmsgbot: reedy Finished scap: Rebuild 1.24wmf15 l10n cache for WikimediaMessages updates (duration: 22m 40s)
14:05 bblack: removed labs-in4 and labs-in6 filters on vlan 1117 (labs-hosts1-a-eqiad) on cr[12]-eqiad
13:47 logmsgbot: reedy Started scap: Rebuild 1.24wmf15 l10n cache for WikimediaMessages updates
13:44 logmsgbot: reedy Synchronized php-1.24wmf15/extensions/RelatedSites/: (no message) (duration: 00m 15s)
13:44 logmsgbot: reedy Synchronized php-1.24wmf15/extensions/WikimediaMessages: (no message) (duration: 00m 14s)
12:10 hashar: stopping Jenkins and restarting it
12:04 hashar: reloading Jenkins configuration
11:37 hashar: Jenkins: upgrading almost all jobs to use a new label 'UbuntuPrecise' bug 68340 150785
10:49 hashar: Jenkins: attempting to poll a Trusty slave (integration-slave1004-trusty [10.68.17.148] with label UbuntuTrusty).
10:32 hashar: Jenkins: tweaking jobs labels, that might eventually screw up Zuul/Jenkins entirely.
08:43 _joe_: start rolling reload of nginx to catch up with the new ssl config
06:50 springle: labsdb1001 migration complete, should be all systems go
03:19 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Jul 31 03:18:07 UTC 2014 (duration 18m 6s)
02:36 logmsgbot: LocalisationUpdate completed (1.24wmf15) at 2014-07-31 02:35:29+00:00
02:20 logmsgbot: LocalisationUpdate completed (1.24wmf14) at 2014-07-31 02:19:17+00:00
02:06 springle: labsdb1001 migrating to mariadb 10, expect read-only and downtime, see labs-l

July 30

23:27 logmsgbot: maxsem Synchronized php-1.24wmf15/extensions/MwEmbedSupport/: (no message) (duration: 00m 03s)
23:27 logmsgbot: maxsem Synchronized php-1.24wmf15/extensions/Wikidata/: (no message) (duration: 00m 08s)
23:26 logmsgbot: maxsem Synchronized php-1.24wmf15/extensions/SyntaxHighlight_GeSHi/: (no message) (duration: 00m 05s)
23:23 logmsgbot: maxsem Synchronized php-1.24wmf14/extensions/Wikidata: (no message) (duration: 00m 11s)
23:13 logmsgbot: maxsem Synchronized wmf-config: (no message) (duration: 00m 05s)
21:04 AaronSchulz: Started populateBacklinkNamespace.php on wikidata and commons
21:02 bblack: turned icinga email/sms back on
20:24 bblack: icinga back online again
19:57 bblack: shutting off icinga to make some optimizations
19:20 bblack: icinga is now substantially back online. email/sms still disabled for now, and downtimes/acks need to be re-added for known issues
19:06 logmsgbot: csteipp Synchronized php-1.24wmf14/includes/: (no message) (duration: 00m 05s)
19:04 logmsgbot: csteipp Synchronized php-1.24wmf15/includes/: (no message) (duration: 00m 07s)
18:59 bblack: icinga coming back up again for the first time, expect random strangeness to be ignored
18:46 bblack: temporarily hard-disabling email/sms from icinga via 'mv /usr/bin/mail /usr/bin/mail-disabled' on neon to prevent icinga spam on next startup attempt
17:55 bblack: stopping icinga service for now while working out other details
17:25 tacotuesday: repooled elastic1018 and elastic1019 as well
17:21 Coren: labmon1001 rebooting (final check for proper raid+lvm autodetection)
17:08 bblack: working on bringing up new neon install (first puppet run, etc)
17:01 Coren: labmon1001 rebooting (partitioning changes on primary disks)
16:53 tacotuesday: elastic1017 repooled, shards allocating
16:13 bd808: scap and dologmsg from tin won't work until neon is back up and running tcpircbot
16:07 bd808|deploy: Synchronized touch: no-op sync to test scap update (duration: 00m 05s)
16:06 bd808|deploy: scap announce failed -- timeout connecting to tcpircbot on neon.wikimedia.org
16:04 bd808|deploy: Updated scap to 4871208 (rely on $PATH for scap scripts)
15:21 logmsgbot: hoo Synchronized php-1.24wmf15/extensions/Wikidata/extensions/Wikibase/lib/resources/wikibase.js: touch (duration: 00m 20s)
15:17 hashar: upgrading php5 on jenkins slaves
15:07 cmjohnson1: shutting down neon
14:46 logmsgbot: demon Synchronized wmf-config/CirrusSearch-production.php: (no message) (duration: 00m 04s)
14:35 logmsgbot: demon Synchronized wmf-config/PrivateSettings.php: Swift config for Cirrus (duration: 00m 08s)
14:30 godog: rolling restart of ms-fe* to pick up search backup user
14:17 bblack: rebooting neon again, trying to fix the disk situation
14:11 Coren: reinstalling labmon1001 -> change disk partitioning scheme
13:50 springle: neon read-only fs. fsck + reboot
13:16 manybubbles: rebuiding Cirrus index for commons to pick up weighted all field
11:17 _joe_: enabling puppet on all mw* servers
11:15 _joe_: re-enabling puppet on mw1019, last bunch of tests, then re-enabling globally
10:58 _joe_: re-enabling puppet on mw1018, testwiki upgraded to the new config and looks fine
09:25 godog: set weight for ms-be1014 and ms-be1015 to 2300
08:58 _joe_: stopping puppet on the appservers, in preparation for releasing change 148099
08:30 _joe_: powercycling neon, doesn't respond to requests, ssh hangs, console dark
06:41 springle: labsdb1001 work in progress; it may misbehave. see labs-l for updates
04:29 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Jul 30 04:27:56 UTC 2014 (duration 27m 55s)
03:39 logmsgbot: LocalisationUpdate completed (1.24wmf15) at 2014-07-30 03:38:28+00:00
02:51 logmsgbot: LocalisationUpdate completed (1.24wmf14) at 2014-07-30 02:50:14+00:00
01:47 bblack: ip addr del for cp4017's ip6_mapped addr on cp4018 (no idea why it was there...)

July 29

23:37 logmsgbot: catrope Finished scap: SWAT updates for wmf15, I'm lazy (duration: 07m 02s)
23:30 AaronSchulz: Updated /srv/jobrunner to d2298139ea22bf8e48de066a73f28024b140ea33
23:30 logmsgbot: catrope Started scap: SWAT updates for wmf15, I'm lazy
23:28 logmsgbot: catrope Synchronized php-1.24wmf14/extensions/VisualEditor: (no message) (duration: 00m 05s)
23:28 logmsgbot: catrope Synchronized php-1.24wmf14/extensions/MobileFrontend: (no message) (duration: 00m 05s)
23:18 logmsgbot: catrope Synchronized wmf-config/: Do not put OCG in sidebar (duration: 00m 04s)
23:11 logmsgbot: catrope Synchronized wmf-config/: Enable TemplateData GUI on nlwiki (duration: 00m 05s)
23:10 bblack: took OCG service IP out of downtime in icinga, it's live
23:06 logmsgbot: mwalker Synchronized wmf-config: Enabling OCG in production (duration: 00m 04s)
23:05 logmsgbot: aaron Synchronized rpc: 0df032d957155aa475d99e2b887ba98b9a4c32fd (duration: 00m 07s)
23:04 logmsgbot: cscott Synchronized wmf-config: (no message) (duration: 00m 12s)
23:03 logmsgbot: cscott updated /a/common to Iae1ac79d5: Enable OCG in production
22:55 cscott: updated OCG to version aeb8623d6ebe41ae7c7e36c57844bd9ea8e6d595
22:50 RoanKattouw: Fixed ownership of slot0/cache on wikitech (virt1000), was root:root but should have been www-data:www-data
22:24 RoanKattouw: Updated lib/ve submodule inside extensions/VisualEditor on virt1000; wikitechwiki was running a Frankenstein version of VE that was part yesterday's code, part code from April
21:47 logmsgbot: ori Synchronized rpc/RunJobs.php: Ia62e9158f: Added a streamlined RunJobs that can be used by redisJobService (2/2) (duration: 00m 03s)
21:47 logmsgbot: ori Synchronized multiversion: Ia62e9158f: Added a streamlined RunJobs that can be used by redisJobService (1/2) (duration: 00m 03s)
21:44 Reedy: cleared bottuzzu@itwiki watchlist
21:32 spagewmf: spage ran `mwscript namespaceDupes.php --wiki=enwiki --prefix Topic`, 5 pages renamed
21:22 logmsgbot: spage Synchronized wmf-config/InitialiseSettings.php: Enable Flow on Wikimania testing page (duration: 00m 13s)
21:22 logmsgbot: ori updated /a/common to Ia62e9158f: Added a streamlined RunJobs that can be used by redisJobService
21:18 logmsgbot: spage updated /a/common to I3b4622e27: Wikivoyages back to 1.24wmf14
20:54 logmsgbot: aaron Synchronized php-1.24wmf14/includes/media: b45248509c07acb8146d6e735ef68dff193ac290 (duration: 00m 07s)
19:46 Krinkle: Reloading Zuul to deploy I7f80ee0b85d29791b7
19:15 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikivoyages back to 1.24wmf14
19:14 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikivoyages back to 1.24wmf15...
19:14 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 14s)
19:09 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikivoyages back to 1.24wmf14
18:43 cmjohnson1: power cycling virt1009
18:29 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non wikipedias to 1.24wmf15
18:28 logmsgbot: reedy Synchronized php-1.24wmf15/extensions/Wikidata/extensions/Wikibase/lib/config/WikibaseLib.default.php: touch (duration: 00m 16s)
18:26 bblack: removed "filter { input labs6-in; }" from ae3.1119 (labs-support1-c-eqiad) on cr[12]-eqiad
17:52 logmsgbot: aaron Synchronized php-1.24wmf15/includes/media: 76459cebd9cfbb33e9845f7acd8b8c1382cdae61 (duration: 00m 08s)
16:56 logmsgbot: hoo Synchronized wmf-config/Wikibase.php: Bump $wgCacheEpoch for testwikidata (duration: 00m 08s)
16:52 logmsgbot: hoo Synchronized php-1.24wmf15/extensions/Wikidata/: Touch JS (duration: 00m 10s)
16:52 logmsgbot: hoo Synchronized php-1.24wmf14/extensions/Wikidata/: Touch JS (duration: 00m 11s)
16:50 logmsgbot: hoo Synchronized wmf-config/Wikibase.php: Only declare "special" sitegroups for testwikidata (duration: 00m 07s)
16:48 logmsgbot: hoo Synchronized wmf-config/Wikibase.php: Only declare "special" sitegroups for testwikidata (duration: 00m 08s)
16:47 logmsgbot: hoo Finished scap: Updating Wikidata with various changes for testwikidata and a client bug fix. (duration: 27m 27s)
16:37 cmjohnson1: replacing defective disk virt1009
16:20 logmsgbot: hoo Started scap: Updating Wikidata with various changes for testwikidata and a client bug fix.
16:10 logmsgbot: hoo Synchronized wmf-config/Wikibase.php: Make testwikidata use the "special" sitelink group. Preparations for submodule updates. (duration: 00m 08s)
16:10 bd808: logstash log event volume up after restart
16:09 bd808: restarted logstash on logstash1001.eqiad.wmnet; log volume looked to be down from expected levels
16:08 _joe_: reenabled puppet on mw1053
16:03 logmsgbot: hoo Synchronized wmf-config/InitialiseSettings.php: Enable Wikibase other projects links per default for ruwiki (duration: 00m 07s)
15:13 manybubbles: building cirrus indexes for group0 wikis in place to turn on the weighted all field we'll use for performance improvements later
15:06 logmsgbot: manybubbles Synchronized wmf-config: SWAT - deploy cirrussearch all field stage 2 part 2 (duration: 00m 04s)
15:06 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: SWAT - deploy cirrussearch all field stage 2 part 1 (duration: 00m 04s)
13:54 logmsgbot: hashar Synchronized wmf-config/InitialiseSettings.php: added Universiteits Museum Utrecht to the wgCopyUploadsDomains array 150163 (duration: 00m 04s)
13:38 ottomata: restarted gmetad on nickel, seems to have brought ganglia back up
11:30 _joe_: upgrading packages on mw1053, for testing hhvm with pcre-jit enabled
10:35 _joe_: puppet re-enabled on the appservers
10:29 _joe_: temporarily stopping puppet on appservers, releasing a potentially dangerous puppet change
09:10 _joe_: stopping jobrunner on mw1053, disabling puppet as well - running tests
09:02 hashar: restarted zuul-server and zuul-merger on gallium (new version though that is a noop)
09:00 hashar_: Zuul bumping Zuul cloner from patchset 21 to patchset 23. Deploying with tag wmf-deploy-2014-07-29-1
07:51 akosiaris: uploaded PHP 5.3.10-1ubuntu3.13+wmf1 on apt.wikimedia.org. Puppet will upgrade it across the fleet within 20 mins
03:48 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Jul 29 03:47:39 UTC 2014 (duration 47m 38s)
03:11 logmsgbot: LocalisationUpdate completed (1.24wmf15) at 2014-07-29 03:10:31+00:00
02:36 logmsgbot: LocalisationUpdate completed (1.24wmf14) at 2014-07-29 02:35:18+00:00
00:44 logmsgbot: aaron Synchronized php-1.24wmf15/maintenance/runJobs.php: fcfa3153e53dc70e6cd190a087e7bd577fe380fb (duration: 00m 03s)
00:27 logmsgbot: aaron Synchronized php-1.24wmf15/maintenance: f754c239ce93fc5f2db19e93f4fe8a1d1ba7bc27 (duration: 00m 04s)
00:27 logmsgbot: aaron Synchronized php-1.24wmf15/includes: f754c239ce93fc5f2db19e93f4fe8a1d1ba7bc27 (duration: 00m 06s)

July 28

23:58 logmsgbot: ori Finished scap: I42c07b64: Update MobileFrontend (duration: 17m 37s)
23:41 logmsgbot: ori Started scap: I42c07b64: Update MobileFrontend
23:33 logmsgbot: ori Synchronized php-1.24wmf15/extensions/VisualEditor: Update VisualEditor to I944f8fbfa (duration: 00m 04s)
23:25 logmsgbot: ori Synchronized wmf-config/InitialiseSettings.php: I369dbad6e: Allow crats to add/remove petitiondata group on foundationWiki (duration: 00m 04s)
23:21 AaronS: Updated /srv/jobrunner to 0bb0ad62dd9240e0f67b2ded4519f125de13dfbc
23:12 mutante: temp. disabled puppet on neon and ircecho
23:06 mutante: graceful apache on palladium
21:12 hashar: Gerrit: allowed JenkinsBot to submit patches on wikimedia/bots (and thus on all child repositories)
20:50 hashar: operations/puppet.git manifests should no more have leading tabulations Template:GerritI69ddc72f5a072ac7dc4f67622b65f36a70d3c021
20:08 bblack: intermittent 5xx are most likely varnish restarts off and on rest of today
19:51 hashar: Zuul: stopped / started process to clear up obsoletes changes stuck in queue
19:47 hashar: Jenkins/Zuul lost connection somehow. Disabled/Reenabled gearman client in Jenkins
19:44 hashar: Jenkins: updated qunit jobs to roam on both gallium and lanthanum (were previously tied to run only on gallium)
19:42 ottomata: restarted varnishkafka on some esams hosts that have old misconfigured vk processes
19:13 ottomata: restarting varnishkafka on amssq31
19:08 ottomata: restarting varnishkafka on cp3013
17:46 logmsgbot: aaron Synchronized php-1.24wmf14/includes/jobqueue/JobQueueFederated.php: 87e7bfceb795d065d6157ac8ce3381a7814000b5 (duration: 00m 03s)
17:38 logmsgbot: aaron Synchronized php-1.24wmf15/includes/jobqueue/JobQueueFederated.php: 12ce1dc1ec46b06d1160e142ddfaf8dcb1c9f131 (duration: 00m 04s)
16:30 andrewbogott: updated wikitech to 1.24wmf15; turned on OAuth
16:05 Nemo_bis: andrewbogott> Nikerabbit: I'm upgrading it [wikitech wiki], it'll be flaky for a bit
16:00 manybubbles: deone with SWAT
15:57 logmsgbot: manybubbles Synchronized php-1.24wmf14/extensions/VisualEditor/: SWAT - fix visual editor bug - Changes made after reviewing changes are not sent (when caching is enabled) (duration: 00m 07s)
15:46 logmsgbot: manybubbles Synchronized php-1.24wmf15/extensions/VisualEditor/: SWAT - fix visual editor bug - Changes made after reviewing changes are not sent (when caching is enabled) (duration: 00m 08s)
15:41 hoo: Removed all right holders from closed and inaccessible ukwikimedia (bug 68737)
15:39 logmsgbot: manybubbles Synchronized php-1.24wmf15/includes/specials/SpecialRevisiondelete.php: SWAT - fix fatal on revision delete (duration: 00m 08s)
15:33 logmsgbot: manybubbles Synchronized wmf-config/CommonSettings.php: SWAT load Mantle before MobileFrontent (duration: 00m 07s)
15:31 logmsgbot: manybubbles Synchronized php-1.24wmf14/extensions/Echo/: SWAT fix bad variable name in echo (duration: 00m 08s)
15:23 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: SWAT - update some permissions on eswiki (duration: 00m 08s)
15:17 logmsgbot: manybubbles Synchronized php-1.24wmf15/extensions/Echo/: SWAT - fix incorrect variable name (duration: 00m 08s)
15:14 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: SWAT - add import sources to bhwiki (duration: 00m 08s)
15:10 logmsgbot: manybubbles Synchronized php-1.24wmf14/extensions/FundraisingTranslateWorkflow/: SWAT update fundraising to fix botched deploy
12:28 hashar: Upgrading our Jenkins Job Builder fork ( d833015..666e953 )
03:00 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Jul 28 02:59:35 UTC 2014 (duration 59m 34s)
02:26 logmsgbot: LocalisationUpdate completed (1.24wmf15) at 2014-07-28 02:25:34+00:00
02:16 logmsgbot: LocalisationUpdate completed (1.24wmf14) at 2014-07-28 02:15:00+00:00

July 27

05:24 springle: mysqldump s6 dbstore1002 to dbstore1001
02:59 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Jul 27 02:58:15 UTC 2014 (duration 58m 14s)
02:25 logmsgbot: LocalisationUpdate completed (1.24wmf15) at 2014-07-27 02:24:10+00:00
02:14 logmsgbot: LocalisationUpdate completed (1.24wmf14) at 2014-07-27 02:13:44+00:00

July 26

21:29 hashar: restarting Zuul to clear up some stalled changes.
02:58 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Jul 26 02:57:52 UTC 2014 (duration 57m 50s)
02:26 logmsgbot: LocalisationUpdate completed (1.24wmf15) at 2014-07-26 02:25:46+00:00
02:16 logmsgbot: LocalisationUpdate completed (1.24wmf14) at 2014-07-26 02:15:31+00:00

July 25

22:52 mutante: Bugzilla - upgraded to 4.4.5
22:41 mutante: ocg - deleted old log dirs
19:28 hashar: Jenkins : disabling gearman plugin and reenabling it (just uncheck/save/check a box in https://integration.wikimedia.org/ci/configure )
19:25 hashar: zuul@gallium:/etc/zuul/wikimedia$ echo status|nc -q 3 localhost 4730|wc -l ... Yields: 0 . Which mean jobs are no more registered for some reason.
19:24 hashar: Jenkins stalled again yeahhhhh
16:59 mutante: powercycled ms-be1010 - unresponsive to ssh, nothing on mgmt
16:28 MaxSem: Updating PageImages data for mainspace on Commons from terbium
13:09 _joe_: re-enabling puppet, test run on the test host was fine.
13:03 _joe_: stopping puppet on all appservers - will reactivate after testing
11:26 logmsgbot: reedy Synchronized docroot and w: (no message) (duration: 00m 13s)
10:50 hashar: contint: manually cleared /tmp on the 3 labs jenkins slaves.
10:46 hashar: integration-slave1001.eqiad.wmflabs is out of disk space ( / /dev/vda1)
07:29 springle: shutdown tantalum per mwalker request
04:19 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Jul 25 04:18:45 UTC 2014 (duration 18m 44s)
03:31 logmsgbot: LocalisationUpdate completed (1.24wmf15) at 2014-07-25 03:30:33+00:00
02:48 logmsgbot: LocalisationUpdate completed (1.24wmf14) at 2014-07-25 02:47:17+00:00
01:21 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: Ic29ae11fa: On Labs, disable LuaSandbox's profiling feature to isolate bug 68413 (duration: 00m 04s)
00:15 mutante: imported jouncebot from github - https://gerrit.wikimedia.org/r/#/q/project:wikimedia/bots/jouncebot,n,z
00:03 K4-713: updated fundraising civicrm to 0639c11636d9

July 24

23:26 mutante: created gerrit project for jouncebot
23:06 logmsgbot: maxsem Synchronized wmf-config: https://gerrit.wikimedia.org/r/149180 (duration: 00m 05s)
21:53 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 14s)
20:41 mutante: rebooted wm-bot instance
20:21 bblack: restarted backend varnish for parsoid on cp1058
20:20 bblack: restarted backend varnish for parsoid on cp1045
20:08 logmsgbot: reedy Synchronized php-1.24wmf15/extensions/Translate: (no message) (duration: 00m 15s)
20:00 logmsgbot: reedy Synchronized php-1.24wmf15: (no message) (duration: 00m 59s)
19:58 logmsgbot: reedy Synchronized php-1.24wmf14: (no message) (duration: 01m 11s)
19:24 hashar: restarted Zuul
18:44 ori: restarted jobrunners for 01c70b1a892ac3944655f84449e89e4508894101
18:41 AaronSchulz: Updated jobrunners to 01c70b1a892ac3944655f84449e89e4508894101
18:39 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 14s)
18:34 logmsgbot: aaron Synchronized php-1.24wmf14/includes/jobqueue/aggregator/JobQueueAggregatorRedis.php: ca031131396ee1830e239d0b6a314bb571840c11 (duration: 00m 06s)
18:26 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 14s)
18:24 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.24wmf15
18:23 ori: Purged apache from SSL cluster; provisioned as a side-effect of I0b02a46f3 + I76a0d237f
18:21 godog: updated swift ring to bring ms-be1013 weight to 2300
18:17 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikipedias to 1.24wmf14
18:03 logmsgbot: reedy Finished scap: testwiki to 1.24wmf15 and build l10n cache (duration: 31m 12s)
17:32 logmsgbot: reedy Started scap: testwiki to 1.24wmf15 and build l10n cache
16:38 hashar: restarting Jenkins it is broken again
16:10 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: Swap wikimania2013wiki to wikimania2014wiki in wmgCentralAuthLoginIcon (duration: 00m 14s)
15:55 bd808|deploy: Fetched de8022b to /a/common on tin; prod no-op change needed for beta
15:40 bd808|deploy: Fetched c7ae85e to /a/common on tin; prod no-op needed for beta
15:39 ottomata: temporarily stopping puppet on analytics1027
15:14 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings-labs.php: Fix reference thumbnail settings syntax (duration: 00m 13s)
15:13 cmjohnson1: swapping disk 8 es1001
15:10 hashar: Clearing out old Zuul references on operations/puppet.git might cause merge errors
15:10 logmsgbot: yurik Synchronized php-1.24wmf14/extensions/ZeroBanner: (no message) (duration: 01m 07s)
15:08 logmsgbot: yurik Synchronized php-1.24wmf13/extensions/ZeroBanner: (no message) (duration: 01m 11s)
14:30 logmsgbot: yurik Synchronized wmf-config/mobile.php: Font for zero banner (duration: 01m 10s)
13:38 hashar: Deleting old Zuul references in the Zuul maintained repository /srv/ssd/zuul/git/mediawiki/core/ on gallium bug 68481 . Should speed up merge operations on that repository.
10:10 hashar: Zuul code being installed on lanthanum.eqiad.wmnet Will let us use a merger daemon there and the Zuul cloner client. 141758
05:44 springle: labsdb1002 work in progress; it may misbehave. see labs-l for updates
03:57 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Jul 24 03:56:32 UTC 2014 (duration 56m 31s)
03:09 logmsgbot: LocalisationUpdate completed (1.24wmf14) at 2014-07-24 03:08:05+00:00
02:37 logmsgbot: LocalisationUpdate completed (1.24wmf13) at 2014-07-24 02:36:35+00:00
00:44 ori: installing linux-tools on mw1053 to run perf on jobrunner

July 23

23:59 logmsgbot: maxsem Finished scap: Pick up messages forgotten during Zero deployment (duration: 26m 42s)
23:39 ori: running sync-common on mw1053.eqiad.wmnet
23:32 logmsgbot: maxsem Started scap: Pick up messages forgotten during Zero deployment
23:26 logmsgbot: maxsem Synchronized php-1.24wmf14/extensions/MultimediaViewer/: (no message) (duration: 00m 03s)
23:26 logmsgbot: maxsem Synchronized php-1.24wmf14/extensions/VisualEditor/: (no message) (duration: 00m 04s)
23:19 logmsgbot: maxsem Synchronized php-1.24wmf13/extensions/MobileFrontend/: (no message) (duration: 00m 04s)
23:18 logmsgbot: maxsem Synchronized php-1.24wmf14/extensions/MobileFrontend/: (no message) (duration: 00m 04s)
22:39 mutante: removed platinum from icinga
22:36 _joe_: installed mw1053 as the first hhvm jobrunner, currently stopped. Puppet disabled so that it won't restart the jobrunner automatically
21:49 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: I2f366fa93: Use luastandalone on HHVM (duration: 00m 03s)
21:17 hashar: Zuul is all good. It just receives too many patches :-]
20:31 bd808|deploy: Updated /a/common to 07834a9 (beta cluster: use luastandalone); no sync needed
20:30 subbu: deployed parsoid version 47d4bc83
20:27 hashar: Having no idea how to fix zuul. Restarting it and killing the whole queue :-/
20:14 mutante: contacts.wm - set $base_url in default/settings.php to https URL, and $is_https='on' in bootstrap.inc (unpuppetized?)
19:49 logmsgbot: awight Synchronized php-1.24wmf14/extensions/CentralNotice: push many CentralNotice fixes, including GeoIP cookie and hide cookie (duration: 00m 05s)
19:49 logmsgbot: awight Synchronized php-1.24wmf13/extensions/CentralNotice: push many CentralNotice fixes, including GeoIP cookie and hide cookie (duration: 00m 05s)
19:28 logmsgbot: awight Synchronized php-1.24wmf14/extensions/CentralNotice: push many CentralNotice fixes, including GeoIP cookie and hide cookie (duration: 00m 04s)
19:27 logmsgbot: awight Synchronized php-1.24wmf13/extensions/CentralNotice: push many CentralNotice fixes, including GeoIP cookie and hide cookie (duration: 00m 04s)
18:57 hashar: reenabled Gearman plugin in Jenkins. Jobs have been reregistered and seems to be proceeding again
18:55 hashar: back. attempting to fix jenkins
18:38 logmsgbot: yurik Synchronized php-1.24wmf14/extensions/: update to JsonConfig, ZeroBanner, ZeroPortal (duration: 01m 24s)
18:36 hashar: can't fix jenkins / zuul right now. Will be stalled for at least half an hour
18:35 logmsgbot: yurik Synchronized php-1.24wmf13/extensions/: update to JsonConfig, ZeroBanner, ZeroPortal (duration: 01m 27s)
18:33 hashar: Jenkins disabled and reenabled Gearman plugin. The jobs were no more registered in Zuul gearman server :-(
18:32 hashar: Jenkins stalled
17:45 logmsgbot: hoo Synchronized wmf-config/InitialiseSettings-labs.php: (no message) (duration: 00m 07s)
17:38 godog: launched a script on ms-fe1001 to collect thumb stats, no impact expected
17:11 logmsgbot: awight Synchronized php-1.24wmf14/extensions/FundraisingTranslateWorkflow: automatic translate workflow fix for Fundraising/ pages on meta.wmo (duration: 00m 04s)
15:38 logmsgbot: reedy Synchronized php-1.24wmf14/extensions/Wikidata: touch (duration: 00m 15s)
15:34 logmsgbot: reedy Synchronized php-1.24wmf14/extensions/Wikidata: Fix css issue in entity suggester on Wikidata (duration: 00m 17s)
15:19 logmsgbot: reedy Synchronized php-1.24wmf14/resources/Resources.php: Fixing forgotten OOUI messages (duration: 00m 15s)
15:11 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: Remove flickrApiUrl from (duration: 00m 15s)
14:19 akosiaris: upgraded php5 on mw1017 (test.wikipedia.org) deployment-apache0{1,2} (beta) to 5.3.10-1ubuntu3.13+wmf1
12:42 hashar: upgraded gdnsd on gallium (used to lint operations/dns.git changes)
09:57 hashar: Zuul migrated to zuul user :)
09:43 hashar: zuul changing file ownership on gallium for /srv/ssd/zuul/git from jenkins:root to zuul:zuul
09:42 hashar: breaking zuul
05:29 springle: clone mariadb 10 labsdb1002 to labsdb100[13]
04:11 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Jul 23 04:09:54 UTC 2014 (duration 9m 53s)
03:21 logmsgbot: LocalisationUpdate completed (1.24wmf14) at 2014-07-23 03:20:45+00:00
02:50 logmsgbot: LocalisationUpdate completed (1.24wmf13) at 2014-07-23 02:49:34+00:00

July 22

23:37 logmsgbot: ebernhardson Finished scap: Update flow in wmf/1.24wmf14 (duration: 17m 08s)
23:20 logmsgbot: ebernhardson Started scap: Update flow in wmf/1.24wmf14
21:57 logmsgbot: reedy Synchronized php-1.24wmf14/extensions/WikimediaMessages/: Fix fatal for dumps (duration: 00m 15s)
21:52 logmsgbot: reedy Synchronized php-1.24wmf13/extensions/WikimediaMessages/: Fix fatal for dumps (duration: 00m 13s)
18:46 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non wikipedias to 1.24wmf14 again
18:45 logmsgbot: reedy Synchronized php-1.24wmf14/extensions/CirrusSearch/: Fix fatal (duration: 00m 15s)
18:13 Reedy: Running sync-common on mw1081
18:10 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non wikipedias back to 1.24wmf13 due to Wikidata and Cirrus fatals
18:06 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non wikipedias to 1.24wmf14
17:48 logmsgbot: mwalker Finished scap: Deploying Petition extension to the cluster (duration: 28m 27s)
17:19 logmsgbot: mwalker Started scap: Deploying Petition extension to the cluster
17:12 logmsgbot: aude Synchronized php-1.24wmf14/extensions/Wikidata/extensions/ValueView/lib/jquery.ui/jquery.ui.suggester.js: touch jquery.ui.suggester.js for Wikidata (duration: 00m 05s)
17:06 logmsgbot: aude Synchronized php-1.24wmf14/extensions/Wikidata: Update Wikidata submodule for test wikidata, for real! (duration: 00m 06s)
17:02 logmsgbot: aude Synchronized php-1.24wmf14/extensions/Wikidata/extensions/Wikibase/lib/resources/wikibase.js: touch wikibase.js for test wikidata only, fix caching issues (duration: 00m 05s)
16:55 logmsgbot: aude Synchronized php-1.24wmf14/extensions/Wikidata/extensions/ValueView/lib/jquery.ui/jquery.ui.suggester.js: touch jquery.ui.suggester.js for Wikidata (duration: 00m 05s)
16:48 logmsgbot: aude Synchronized php-1.24wmf14/extensions/Wikidata: Update Wikidata: js and json dump fixes (duration: 00m 11s)
16:26 logmsgbot: aude Synchronized wmf-config/InitialiseSettings.php: enable WikibaseClient on test wikidata (duration: 00m 07s)
16:25 logmsgbot: aude Synchronized wmf-config/Wikibase.php: add settings for enabling WikibaseClient on test wikidata (duration: 00m 04s)
16:18 logmsgbot: reedy Purged l10n cache for 1.24wmf12
16:18 logmsgbot: reedy Purged l10n cache for 1.24wmf11
16:17 logmsgbot: reedy Purged l10n cache for 1.24wmf10
15:12 manybubbles: done with SWAT
15:11 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: SWAT - touching InitializeSettings.php to make dblist change go (duration: 00m 06s)
15:10 logmsgbot: manybubbles Synchronized commonsuploads.dblist: SWAT add mrwiki to commonsuploads list (duration: 00m 08s)
15:06 logmsgbot: manybubbles Synchronized php-1.24wmf14/extensions/CirrusSearch/: SWAT small cirrus fixes (duration: 00m 08s)
14:48 _joe_: removed old, unused puppet 2.7 packages from reprepro for trusty
14:00 _joe_: reinstalling mw1053 in 5 minutes, downtime on icinga, puppet disabled, setting to 'false' everywhere in pybal
05:31 bblack: authdns servers (mexia, rubidium, eeden) updated to gdnsd-1.11.4~precise1
03:13 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Jul 22 03:12:10 UTC 2014 (duration 12m 9s)
02:37 logmsgbot: LocalisationUpdate completed (1.24wmf14) at 2014-07-22 02:36:27+00:00
02:14 logmsgbot: LocalisationUpdate completed (1.24wmf13) at 2014-07-22 02:13:40+00:00
00:56 mutante: tungsten,fluorine, search1001-1006 - upgraded libssl

July 21

23:44 mutante: graceful apache on magnesium
23:42 legoktm: cleaned up stalled global rename of Felipegaspars --> L'editeur
23:39 logmsgbot: maxsem Synchronized php-1.24wmf14/extensions/Flow/: https://gerrit.wikimedia.org/r/#/c/148270/ - revert previous change (duration: 00m 04s)
23:36 logmsgbot: maxsem Synchronized php-1.24wmf14/extensions/TimedMediaHandler/: https://gerrit.wikimedia.org/r/#/c/148241/ (duration: 00m 04s)
23:32 logmsgbot: maxsem Synchronized php-1.24wmf14/extensions/Flow/: https://gerrit.wikimedia.org/r/#/c/148249/ (duration: 00m 04s)
23:30 logmsgbot: maxsem Synchronized php-1.24wmf14/extensions/MobileFrontend/: https://gerrit.wikimedia.org/r/148128 (duration: 00m 06s)
23:29 logmsgbot: maxsem Synchronized php-1.24wmf14/extensions/AbuseFilter/: https://gerrit.wikimedia.org/r/#/c/148027/ (duration: 00m 06s)
23:27 logmsgbot: maxsem Synchronized php-1.24wmf14/resources/: https://gerrit.wikimedia.org/r/#/c/147854/ (duration: 00m 05s)
22:21 mutante: installing package upgrades on bast1001
21:50 RobH: shutting down ms1002 for reclaim into labstore1003
21:37 ottomata: running kafka preferred-replica-election to rebalance topics
21:27 hashar: beta: removed build timeout from beta-update-databases-eqiad Jenkins jobs. There is a huge schema change being processed by update.php
20:09 subbu: deployed parsoid version 1c9277d6
17:24 mutante: elastic1009,analytics1004,silver, various misc. boxes - upgrading libssl
17:16 mutante: installing package upgrades on iron
16:19 godog: restarted uwsgi on tungsten
16:02 andrewbogott: updated OpenStackManager on wikitech
15:48 logmsgbot: demon Synchronized wmf-config: Undeploying CommunityVoice/ClientSide extensions (duration: 00m 08s)
15:30 logmsgbot: demon Synchronized wmf-config/flaggedrevs.php: ukwiki gets FR for NS_MODULE (duration: 00m 04s)
15:25 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: Set $wgUploadNavigationUrl for plwikisource (duration: 00m 04s)
15:25 logmsgbot: demon Synchronized php-1.24wmf14/extensions/CirrusSearch: CirrusSearch to master for 1.24wmf14 (duration: 00m 07s)
15:22 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: Set $wgForceUIMsgAsContentMsg for zhwikivoyage (duration: 00m 05s)
15:17 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: TemplateData for fiwiki (duration: 00m 06s)
13:41 _joe_: restarted apache on palladium
12:01 apergos: started /usr/local/bin/dumpwikidatajson.sh in root screen session on snapshot1003
03:11 springle: restarted apache on strontium
02:58 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Jul 21 02:57:03 UTC 2014 (duration 57m 2s)
02:50 logmsgbot: krinkle Synchronized wmf-config/InitialiseSettings.php: I27c6f82af5e9b (duration: 00m 06s)
02:26 logmsgbot: LocalisationUpdate completed (1.24wmf14) at 2014-07-21 02:25:08+00:00
02:14 logmsgbot: LocalisationUpdate completed (1.24wmf13) at 2014-07-21 02:13:42+00:00

July 20

02:59 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Jul 20 02:57:59 UTC 2014 (duration 57m 58s)
02:26 logmsgbot: LocalisationUpdate completed (1.24wmf14) at 2014-07-20 02:25:54+00:00
02:15 logmsgbot: LocalisationUpdate completed (1.24wmf13) at 2014-07-20 02:14:47+00:00
01:40 ori: synced docroot/default/index.html (I005f43b96: Add width/height attributes to img to fix reflow)

July 19

23:33 logmsgbot: aaron Synchronized php-1.24wmf4/maintenance: 926e1997b53f563a4e7f3c540e32b45ddb24b3c5 & 017891ba41cc72987bf3cb441004a847d20105b4 (duration: 00m 08s)
23:33 logmsgbot: aaron Synchronized php-1.24wmf4/includes: 926e1997b53f563a4e7f3c540e32b45ddb24b3c5 & 017891ba41cc72987bf3cb441004a847d20105b4 (duration: 00m 09s)
15:43 bblack: restarted gitblit service on antimony
05:02 Krinkle: Ungracefully restarting Zuul to clear the items stuck in the queue (picked a moment with no real items waiting in the queue).
03:25 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool es1001 (duration: 00m 06s)
02:57 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Jul 19 02:56:30 UTC 2014 (duration 56m 29s)
02:27 logmsgbot: LocalisationUpdate completed (1.24wmf14) at 2014-07-19 02:26:26+00:00
02:16 logmsgbot: LocalisationUpdate completed (1.24wmf13) at 2014-07-19 02:15:20+00:00
01:09 logmsgbot: awight Synchronized php-1.24wmf14/extensions/FundraisingTranslateWorkflow: update FundraisingTranslateWorkflow submodule content (take 4) (duration: 00m 04s)
01:08 logmsgbot: awight Synchronized php-1.24wmf13/extensions/FundraisingTranslateWorkflow: update FundraisingTranslateWorkflow submodule content (take 4) (duration: 00m 04s)
01:07 logmsgbot: awight Synchronized php-1.24wmf12/extensions/FundraisingTranslateWorkflow: update FundraisingTranslateWorkflow submodule content (take 4) (duration: 00m 04s)
00:36 logmsgbot: awight Synchronized php-1.24wmf14/extensions/FundraisingTranslateWorkflow: update FundraisingTranslateWorkflow submodule content (take 3) (duration: 00m 04s)
00:36 logmsgbot: awight Synchronized php-1.24wmf13/extensions/FundraisingTranslateWorkflow: update FundraisingTranslateWorkflow submodule content (take 3) (duration: 00m 04s)
00:36 logmsgbot: awight Synchronized php-1.24wmf12/extensions/FundraisingTranslateWorkflow: update FundraisingTranslateWorkflow submodule content (take 3) (duration: 00m 04s)

July 18

22:04 logmsgbot: awight Synchronized php-1.24wmf14: update FundraisingTranslateWorkflow submodule (take 2) (duration: 00m 58s)
22:03 logmsgbot: awight updated /a/common/php-1.24wmf14 to I1036dae02: Update mediawiki/core/vendor to head to 1.24wmf14
21:00 logmsgbot: awight Synchronized php-1.24wmf13: update FundraisingTranslateWorkflow submodule (duration: 01m 04s)
20:58 awight: for the record, I actually updated to ade90e0e22492d87e6069db3a359b22ef56401a6
20:57 logmsgbot: awight updated /a/common/php-1.24wmf13 to Id3462554b: Made --maxtime a soft limit again
20:50 logmsgbot: awight Synchronized php-1.24wmf12: update FundraisingTranslateWorkflow submodule (duration: 00m 49s)
20:48 logmsgbot: awight Synchronized php-1.24wmf12: update FundraisingTranslateWorkflow submodule (duration: 00m 21s)
20:48 logmsgbot: awight updated /a/common/php-1.24wmf12 to Idf3f49941: Updating ZeroBanner
20:41 MaxSem: Load testing GeoData
19:11 mutante: restarted apache on strontium.. sigh
18:17 logmsgbot: aaron Synchronized php-1.24wmf13/maintenance/runJobs.php: ae053860dc36a07f05ab9e31299f2da0d2f66e85 (duration: 00m 03s)
18:16 logmsgbot: aaron Synchronized php-1.24wmf14/maintenance/runJobs.php: 684c21c325370aa3baac631ae9a006fc8861b952 (duration: 00m 03s)
18:05 logmsgbot: aaron Synchronized wmf-config/jobqueue-eqiad.php: Set "daemonized" flag for the redis job queue (duration: 00m 04s)
17:33 cmjohnson: replacing disk 2 es1005
17:25 mutante: temp. stopped icinga-wm to avoid channel spam
17:24 mutante: puppetmaster on strontium had 'Unexpected error in mod_passenger" causing puppet fails all over the place with error 500 on master, resumed normal after graceful
17:21 mutante: graceful'ed apache on strontium
14:37 godog: rolling reload of proxy-server on swift ms-fe1* to pick up changes
13:19 _joe_: re-enabling puppet, applying on a sample of hosts created no change according to my tests.
13:13 _joe_: temporarily disabling puppet on mw servers, will re-enable when I'm done with testing (again) the change
11:20 godog: restart proxy-server on ms-fe1003, as suspected it wasn't running the latest version
11:14 godog: restart proxy-server on ms-fe1003, double checking for a change in numbers reported to graphite
10:04 godog: stagger reload swift {account,object,container} server in ms-be.eqiad to pick up recon changes
06:01 AaronSchulz: Updated /srv/deployment/jobrunner to 4cddd5033efadf431e138c399b5d86542e32f196
03:55 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Jul 18 03:53:55 UTC 2014 (duration 53m 54s)
03:22 ori: Updated jobrunner to d9520c9 and restarted service on all jobrunners
03:09 logmsgbot: LocalisationUpdate completed (1.24wmf14) at 2014-07-18 03:08:02+00:00
02:45 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: Repool db1021, context RT 7916, warm up (duration: 00m 08s)
02:37 logmsgbot: LocalisationUpdate completed (1.24wmf13) at 2014-07-18 02:36:54+00:00

July 17

23:49 logmsgbot: mwalker Finished scap: SWAT for 146651, 147102, 146925, 147331, 147332, and 147206
23:19 logmsgbot: mwalker Started scap: SWAT for 146651, 147102, 146925, 147331, 147332, and 147206
21:02 csteipp: deployed fix for bug68187
20:29 ori: updated jobrunner to 71d84ea18d and restarted service
18:36 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.24wmf14
18:30 springle: db1021 raid write-cache failure, BBU at 9%
18:14 springle: db1021 disabled sync_binlog, thread tied up on fsync
18:11 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikipedias to 1.24wmf13
18:09 logmsgbot: reedy Synchronized wmf-config/: De-pool db1021 due to increasing replag (duration: 00m 14s)
17:40 logmsgbot: reedy Finished scap: testwiki to 1.24wmf14 take 2 (duration: 33m 02s)
17:30 Jeff_Green: payments1002 dist upgrade & reboot
17:21 mutante: nickel (ganglia) apt-get upgrading packages
17:13 Jeff_Green: dist-upgrade and reboot payments1003
17:07 logmsgbot: reedy Started scap: testwiki to 1.24wmf14 take 2
17:04 logmsgbot: reedy scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki="testwiki" --list-file="/a/common/wmf-config/extension-list" --output="/tmp/tmp.VsoJsYY6Q2" ' returned non-zero exit status 1 (duration: 02m 46s)
17:03 RobH: payments4 is kernel updating (per jgreen)
17:01 logmsgbot: reedy Started scap: testwiki to 1.24wmf14
15:05 logmsgbot: manybubbles Synchronized php-1.24wmf13/extensions/MultimediaViewer/: SWAT - Moving repo icon back to the right-hand side in Media Viewer (duration: 00m 05s)
15:03 logmsgbot: manybubbles Synchronized wmf-config/CommonSettings-labs.php: SWAT deploy to keep us synced, but this is a noop in prod. only anything in beta. (duration: 00m 05s)
07:27 springle: mariadb 10 on labsdb1002:3309 cloning s5 from sanitarium db1054:3308
03:33 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Jul 17 03:32:25 UTC 2014 (duration 32m 24s)
02:47 logmsgbot: LocalisationUpdate completed (1.24wmf13) at 2014-07-17 02:46:24+00:00
02:24 logmsgbot: LocalisationUpdate completed (1.24wmf12) at 2014-07-17 02:23:08+00:00

July 16

23:55 logmsgbot: maxsem Synchronized private: Clean up old mobile cruft (duration: 00m 05s)
23:17 logmsgbot: yurik Synchronized php-1.24wmf13/extensions/: update to JsonConfig, ZeroBanner, ZeroPortal (duration: 04m 03s)
23:13 logmsgbot: yurik Synchronized php-1.24wmf12/extensions/: update to JsonConfig, ZeroBanner, ZeroPortal (duration: 04m 14s)
22:34 andrewbogott: temporarily fixed puppet on tin by restarting salt-master and salt-minion. A proper fix would involve upgrading to a salt version that fixes https://github.com/saltstack/salt/issues/6306
22:29 logmsgbot: yurik Synchronized php-1.24wmf13/extensions/ZeroBanner: (no message) (duration: 03m 55s)
22:27 ori: restarted jobrunner service on all job runners
22:18 logmsgbot: yurik Synchronized php-1.24wmf12/extensions/ZeroBanner: (no message) (duration: 04m 31s)
21:50 AaronSchulz: Updated job runners to 186b9b33
21:08 legoktm: clearing Magog the Ogre's watchlist on enwp per request (173668 entries)
21:01 logmsgbot: yurik Synchronized php-1.24wmf13/extensions/: update to JsonConfig, ZeroBanner, ZeroPortal (duration: 04m 53s)
20:56 logmsgbot: yurik Synchronized php-1.24wmf12/extensions/: update to JsonConfig, ZeroBanner, ZeroPortal (duration: 04m 54s)
20:22 subbu: deploy parsoid 060dcb54
19:56 ottomata: reenabling puppet on analytics1027
19:21 ottomata: temp disabling puppet on analytics1027
17:57 akosiaris: clean puppet stored config database for osm-db100{1,2}.eqiad.wmnet, updating icinga
16:49 Reedy: Restarted jenkins again
16:12 Reedy: Restarted jenkins
16:11 Reedy: Killed jenkins
14:34 _joe_: moving the stale conf-enabled directory away on jobrunners, or when we upgrade to trusty all hell will break loose
13:06 logmsgbot: oblivian gracefulled all apaches
12:14 logmsgbot: oblivian gracefulled all apaches
12:01 _joe_: removed stale files from /etc/apache2/conf-enabled on all mw hosts
11:25 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: Take Cirrus as default from more wikis while we figure out load issues (duration: 00m 06s)
10:32 _joe_: releasing a new apache config to all mediawikis
08:54 godog: repool ms-fe1004
08:51 godog: repool ms-fe1003 and depool ms-fe1004
08:46 godog: repool ms-fe1002 and depool ms-fe1003
08:39 godog: depool ms-fe1002 for swift upgrade
05:54 springle: resuming page content model schema changes, osc_host.sh processes on terbium ok to kill in emergency
04:22 springle: restarted gitblit on antimony
03:04 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Jul 16 03:03:41 UTC 2014 (duration 3m 40s)
02:27 logmsgbot: LocalisationUpdate completed (1.24wmf13) at 2014-07-16 02:26:12+00:00
02:15 logmsgbot: LocalisationUpdate completed (1.24wmf12) at 2014-07-16 02:14:32+00:00
01:34 manybubbles: moving shards off of elastic101[789]

July 15

23:20 logmsgbot: maxsem Synchronized wmf-config/: https://gerrit.wikimedia.org/r/#/c/146615/ (duration: 00m 04s)
23:16 logmsgbot: maxsem Synchronized php-1.24wmf12/extensions/CirrusSearch/: https://gerrit.wikimedia.org/r/#q,146471,n,z (duration: 00m 05s)
23:14 logmsgbot: maxsem Synchronized php-1.24wmf13/includes/specials/SpecialVersion.php: (no message) (duration: 00m 04s)
23:13 logmsgbot: maxsem Synchronized php-1.24wmf13/extensions/CirrusSearch/: https://gerrit.wikimedia.org/r/#q,146471,n,z (duration: 00m 04s)
22:35 K4-713: synchronized payments to afa12be34769000bf8
21:34 _joe_: disabling puppet on mw1001, tests
21:26 logmsgbot: aude Synchronized php-1.24wmf13/extensions/Wikidata: Update submodule to fix entity search issue on Wikidata (duration: 00m 21s)
21:15 ori: to test r146607, locally modified upstart conf for jobrunner on mw1001 to log to /var/log/mediawiki, and restarted service
20:24 ori: restarted jobrunner on all jobrunners
20:23 AaronSchulz: Deployed /srv/jobrunner to 31e54c564d369e89613db48977eec0a5891b6498
20:21 logmsgbot: reedy Synchronized docroot and w: (no message) (duration: 00m 21s)
20:18 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non wikipedias to 1.24wmf13
20:12 Krinkle: Reloading Zuul to deploy If2312bcf18bdbe8dee
20:12 bd808: log volume up after logstash restart
20:10 bd808: restarted logstash on logstash1001; log volume looked to be down from "normal"
19:55 Reedy: Applied extensions/UploadWizard/UploadWizard.sql to rowiki (re bug 59242)
18:53 manybubbles: bouncing elastic1018 to pick up new merge policy. hopefully that'll help with io thrashing
17:58 ori: _joe_ deployed jobrunner to all job runners
17:40 manybubbles: my last attempt to lower the concurrent traffic for recovery was a failure - tried again and succeeded. that seems to have fixed the echo service disruption from taking elastic1017 out of service
17:37 ori: updated jobrunner to bef32b9120
17:29 manybubbles: elastic1017 went nuts again. just shutting elasticsearch off on it for now
17:17 manybubbles: lowered Elasticsearch concurrent recovery streams to 2 (from 3) and total write rate across those streams to 20MB/sec (from 4MB/sec). This should prevent io thrash on recovery which looked to cause echo distruptions in service while recovering from some other disruption.
16:25 _joe_: all mw servers updated
16:10 _joe_: mw1100 and onwards updated
16:00 _joe_: mw1060-mw1099 updated
15:57 manybubbles: restarting Elasticsearch on elastic1017 - its thrashing the disk again. I'm still not 100% sure why
15:56 _joe_: mw1020-mw1059 updated
15:53 _joe_: mw101[0-9] updated
15:51 manybubbles: elasticsearch1017 is freaking out again - maybe there is something wrong with it. odds aren't good it picked up the same shard again after restart and that shard is somehow poison just for it and not the other two nodes with the same shard....
15:47 _joe_: starting rolling update of all appservers to apache2 2.2.22-1ubuntu1.6, half of them are on 2.2.22-1ubuntu1.5 now
15:42 manybubbles: setting the filter cache on one node in the cluster set it on all. yay, I guess. Anyway, I'm going to let it soak for a while.
15:32 manybubbles: setting filter cache size to 20% on elastic1001 to see if it takes/helps us
15:19 logmsgbot: anomie Synchronized wmf-config/: SWAT: Remove dead ULS variable gerrit:145861 (duration: 00m 10s)
15:18 anomie: anomie actually committed a live hack someone left on tin (removing db1035)
15:16 logmsgbot: anomie updated /a/common to I7ca6a16d5: Switch jawiki back to lsearchd
13:52 manybubbles: after switching jawiki back to lsearchd by default load is mostly recovered. the cluster is still healing from bouncing elastic1017 and that'll take a while. the load will be a bit high during that but searches are coming back in a reasonably amount of time again
13:42 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: jawiki back to lsearchd (duration: 00m 05s)
13:38 manybubbles: elastic1017 had a load average of 60 - was thashing in io. bounced Elasticsearch. lets see if it recovers on its own
09:09 _joe_: restarting mailman on sodium, again, for testing
08:50 godog: restart mailman on sodium after inodes freed
07:27 _joe_: restarted mailman on sodium
07:22 _joe_: stopping mailman on sodium for repairing
06:54 _joe_: killed jenkins stale process on gallium, stuck in a futex while shutting down
04:48 springle: db1035 crash cycle. down for memtest and stuff
03:34 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Jul 15 03:33:38 UTC 2014 (duration 33m 37s)
03:01 logmsgbot: LocalisationUpdate completed (1.24wmf13) at 2014-07-15 03:00:03+00:00
02:34 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1035, crashed (duration: 00m 13s)
02:30 logmsgbot: LocalisationUpdate completed (1.24wmf12) at 2014-07-15 02:29:02+00:00
02:27 springle: powercycle db1035 unresponsive

July 14

23:52 logmsgbot: mwalker Finished scap: Updating for SWAT 146304, 146306, 146149, 146165, 146166, 146282, and 146281. Also finishing awight's deploy of FundraisingTranslateWorkflow. (duration: 19m 42s)
23:32 logmsgbot: mwalker Started scap: Updating for SWAT 146304, 146306, 146149, 146165, 146166, 146282, and 146281. Also finishing awight's deploy of FundraisingTranslateWorkflow.
20:22 cscott: updated Parsoid to version d51e64097bb1b18e356584d4f3ddcfd90a6071ba
19:57 ori: postponing jobrunner deployment to tomorrow; ran over time
19:45 _joe_: doing the same on mw1064, segfaulted for the same reason
19:44 _joe_: killed a lone apache2 child on mw1152, stuck in a futex, after a segfault of another apache process. Restarted apache, now working correctly
19:03 godog: re-enabling mailman on sodium, missing list config restored
18:49 logmsgbot: awight Synchronized wmf-config: Deploying FundraisingTranslateWorkflow on metawiki (t
18:45 logmsgbot: awight Synchronized php-1.24wmf13/extensions/FundraisingTranslateWorkflow: Update FundraisingTranslateWorkflow extension (wmf13) (duration: 00m 05s)
18:43 logmsgbot: awight Synchronized php-1.24wmf12/extensions/FundraisingTranslateWorkflow: Update FundraisingTranslateWorkflow extension (duration: 00m 05s)
18:15 logmsgbot: awight Synchronized wmf-config: Revert: Deploying FundraisingTranslateWorkflow on metawiki (duration: 00m 04s)
18:03 logmsgbot: awight Synchronized wmf-config: Deploying FundraisingTranslateWorkflow on metawiki (duration: 00m 05s)
18:03 logmsgbot: awight updated /a/common to Ie7599fb6e: jawiki gets Cirrus as primary search
17:43 Krinkle: npm-cache for integration slaves got corrupted again. Depooling/Repooling integration-slave100{1,2,3} onoe by one to clear cache and let it warm up again.
17:35 Krinkle: Jenkins slaves in labs are unable to reach zuul.eqiad.wmnet
17:10 andrewbogott: purging old local-* service group entries from labs ldap (via purgeOldServiceGroups.php)
17:05 godog: started mailman on sodium post-reboot
17:04 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: nlwiki getting cirrus as primary (duration: 00m 04s)
15:11 logmsgbot: manybubbles Synchronized wmf-config: SWAT update cirrus settings for commons (duration: 00m 04s)
15:04 logmsgbot: manybubbles Synchronized wmf-config: SWAT update cirrus settings for commons (duration: 00m 04s)
15:01 logmsgbot: manybubbles Synchronized wmf-config: SWAT update cirrus settings for commons (duration: 00m 05s)
14:39 _joe_: rebooted nescio, stuck and with console showing just a truncated log (timestamp only)
14:33 mutante: powercycling sodium
14:02 mutante: stat1002 - "Could not find declared class ::oozie"
09:36 legoktm: ran initSiteStats.php on all wikivoyages for bug 64370
09:02 godog: repool ms-fe1001 after upgrade, basic testing successful
08:33 godog: depool ms-fe1001 for swift icehouse upgrade
02:57 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Jul 14 02:56:22 UTC 2014 (duration 56m 21s)
02:24 logmsgbot: LocalisationUpdate completed (1.24wmf13) at 2014-07-14 02:23:39+00:00
02:14 logmsgbot: LocalisationUpdate completed (1.24wmf12) at 2014-07-14 02:12:54+00:00

July 13

22:12 ori: stopping puppet on rcs1001 to debug nginx issue
21:03 Krinkle: git-deploy: Deploying integration/slave-scripts I7f2b476807465
02:54 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Jul 13 02:53:33 UTC 2014 (duration 53m 32s)
02:25 logmsgbot: LocalisationUpdate completed (1.24wmf13) at 2014-07-13 02:23:56+00:00
02:14 logmsgbot: LocalisationUpdate completed (1.24wmf12) at 2014-07-13 02:13:32+00:00
02:12 legoktm: migratePass0.php finished a while back

July 12

22:21 legoktm: running foreachwiki extensions/CentralAuth/maintenance/migratePass0.php (bug 67350)
22:04 legoktm: checkLocalNames/checkLocalUser finished a few hours ago, I don't have a timestamp (bug 67350)
13:51 godog: reboot ms-be1007, xfs problems on sdn, load at 300+
07:39 legoktm: started running checkLocalUser.php --delete=1 on all CentralAuth wikis for bug 67350
07:37 legoktm: started running checkLocalNames.php --delete=1 on all CentralAuth wikis for bug 67350
02:52 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Jul 12 02:51:47 UTC 2014 (duration 51m 46s)
02:26 logmsgbot: LocalisationUpdate completed (1.24wmf13) at 2014-07-12 02:25:47+00:00
02:16 logmsgbot: LocalisationUpdate completed (1.24wmf12) at 2014-07-12 02:15:33+00:00

July 11

23:44 logmsgbot: awight Synchronized wmf-config: Deploying FundraisingTranslateWorkflow to labs (take 2) (duration: 00m 04s)
23:36 logmsgbot: awight Synchronized wmf-config: Deploying FundraisingTranslateWorkflow to labs (duration: 00m 05s)
23:34 logmsgbot: awight updated /a/common to I862a4afed: Fixup highlightTest.php
22:44 mutante: upgraded libssl on wtp*
22:33 Krinkle: Restarting Jenkins
22:33 Krinkle: Pooled/depooled Jenkins slave on gallium
22:31 Krinkle: jenkins/gallium's weekly w(h)ine hour is here.
21:31 Krinkle: Reloading Zuul to deploy config change I993eba5ab7b70f924a2b925fea7c196db27c4cc3
20:57 ottomata: disabling puppet on analytics1004 (AGH!)
20:51 ottomata: bringing up some hadoop journalnodes (and datanodes)
20:33 mutante: wikitech - graceful apache for ssl cipher list change
18:19 mutante: OTRS - enabled STS, updated SSL cipher list, restarted Apache on iodine
15:15 logmsgbot: hoo Synchronized php-1.24wmf13/extensions/Wikidata/: Fix the wbsearchentities API (duration: 00m 13s)
15:14 logmsgbot: hoo Synchronized php-1.24wmf12/extensions/Wikidata/: Fix the wbsearchentities API (duration: 00m 16s)
13:52 hashar: Jenkins: mediawiki/core change being queued while Jenkins is busy proceeding some history. That is normal, will resume soon ™
12:07 hashar: Jenkins: dropping history of mwext-Wikibase-testextensions-master as well
12:05 hashar_: Jenkins: manually removing history of mwext-Wikibase-client-tests and mwext-Wikibase-repo-tests . They are no more used since January
08:54 hoo: Started rebuildItemsPerSite for wikidatawiki on terbium
03:31 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Jul 11 03:30:11 UTC 2014 (duration 30m 10s)
03:01 logmsgbot: LocalisationUpdate completed (1.24wmf13) at 2014-07-11 03:00:20+00:00
02:31 logmsgbot: LocalisationUpdate completed (1.24wmf12) at 2014-07-11 02:30:24+00:00

July 10

23:38 logmsgbot: mwalker Finished scap: Updating Core, VE, and GuidedTour for scap, 145400, 145401, 145431, and 145460 (duration: 16m 26s)
23:22 logmsgbot: mwalker Started scap: Updating Core, VE, and GuidedTour for scap, 145400, 145401, 145431, and 145460
20:00 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 14s)
19:46 logmsgbot: reedy Synchronized private: (no message) (duration: 00m 14s)
19:45 csteipp: deployed patch for bug65778
19:43 hashar: Jenkins upgrading Gearman plugin from 0.0.6 to 0.0.7 . That fix the way jobs labels are registered with Gearman
19:16 hashar: Killed jenkins :-(
18:37 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 13s)
18:36 logmsgbot: reedy Synchronized database lists: (no message) (duration: 00m 14s)
18:10 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.24wmf13
18:02 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikipedias to 1.24wmf12
17:25 logmsgbot: hoo Synchronized php-1.24wmf13/extensions/Wikidata/: Fix a UI issue and two API related flaws (same version as for wmf12) (duration: 00m 09s)
17:21 logmsgbot: hoo Synchronized php-1.24wmf12/extensions/Wikidata/: Fix a UI issue and two API related flaws (duration: 00m 14s)
16:04 godog: restarted pdns in turn on virt1000 and virt0 after opendj ulimit change
15:56 hashar: gallium running a rather long du command in a screen. Need to have a good figure at how much disk space each jobs consume
15:50 logmsgbot: reedy Finished scap: testwiki to 1.24wmf13 and build l10n cache (duration: 32m 09s)
15:18 logmsgbot: reedy Started scap: testwiki to 1.24wmf13 and build l10n cache
15:15 ottomata: reinstalling analytics1026 and analytics1027
14:10 godog: ran swift-dispersion-populate on eqiad and esams swift clusters
14:04 godog: cycle-restarting swift proxy-server on ms-fe to apply config updates
13:09 godog: restart pdns on virt1000
12:48 springle: ongoing schema changes: pl_from_namespace gerrit 117373. on terbium, osc_host.sh processes ok to kill in emergency
12:43 godog: restart opendj on virt1000 with higher ulimit -n
12:29 godog: restarted opendj on virt1000, ran out of fd
10:29 godog: restart profiler-to-carbon on tungsten, seemingly cpu spinning
09:48 logmsgbot: reedy Synchronized database lists: (no message) (duration: 00m 13s)
09:30 logmsgbot: oblivian gracefulled all apaches
09:23 _joe_: doing a tagged run to sync apache config
09:07 hashar: gallium err was July 5th and file was from a minute ago ... ignore me
09:06 hashar: gallium deleted /var/lib/puppet/state/agent_catalog_run.lock from July 5th. Was preventing me to run puppet agent -tv
08:02 logmsgbot: oblivian gracefulled all apaches
07:52 _joe_: doing a tagged run of puppet on all appservers to sync apache config
06:40 bblack: all normally-ulsfo traffic is back on ulsfo
05:53 awight: edit CRM Drupal permissions
03:47 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Jul 10 03:46:36 UTC 2014 (duration 46m 35s)
03:12 logmsgbot: LocalisationUpdate completed (1.24wmf12) at 2014-07-10 03:11:48+00:00
02:49 mutante: argon,netmon1001, graceful'led apaches
02:48 mutante: netmon1001 - DocumentRoot [/etc/apache2/undef] does not exist
02:42 logmsgbot: LocalisationUpdate completed (1.24wmf11) at 2014-07-10 02:41:29+00:00
02:38 mutante: argon,calcium,iron,rhenium,bast1001,oxygen,netmon1001 - upgraded SSL
01:47 mutante: argon - Ignoring file 'puppet_base_2.7' in directory '/etc/apt/preferences.d/
01:41 awight: update crm schema to wmf_civicrm 7020
01:40 awight: update civicrm from 108802336e4d5f4aab9a6dbfa0ea434bddae0060 to 15cf86cb109a448f1982da9c91215eec73f28499
01:38 mutante: potassium,hydrogen,search1016,nitrogen,analytics1024,chromium - upgrade SSL
01:06 bblack: cleared icinga downtimes for ulsfo (we now have some traffic back there)
00:50 logmsgbot: mattflaschen Synchronized php-1.24wmf11/extensions/GuidedTour/: GuidedTour cherry-pick to 1.24wmf11 in support of GettingStarted anonymous editor acquisition test (duration: 00m 09s)
00:05 logmsgbot: maxsem Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/144938 (duration: 00m 04s)

July 9

23:57 logmsgbot: maxsem Finished scap: SWAT, GettingStarted introduced a new message (duration: 26m 31s)
23:44 mutante: deleted systemusers group on neon & mw1077 (to check it doesnt break anything
23:31 logmsgbot: maxsem Started scap: SWAT, GettingStarted introduced a new message
23:22 logmsgbot: maxsem Synchronized php-1.24wmf11/extensions/GettingStarted/: (no message) (duration: 00m 03s)
23:17 logmsgbot: maxsem Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/144857/ (duration: 00m 04s)
22:42 mark: Enabling PAIX BGP sessions on cr2-ulsfo
22:40 mark: Enabling WMF HQ BGP sessions on cr1-ulsfo
22:38 mark: Enabling TiNet transit links on cr1-ulsfo
22:35 mark: Enabling WMF HQ BGP sessions on cr2-ulsfo
22:34 mark: Enabling NTT and HE transit links on cr2-ulsfo
22:05 mutante: restarted apache on zirconium for config change
20:07 subbu: deployed parsoid 1632288d
18:36 logmsgbot: yurik Synchronized php-1.24wmf12/extensions/: update to JsonConfig, ZeroBanner, ZeroPortal (duration: 01m 22s)
18:29 logmsgbot: yurik Synchronized php-1.24wmf11/extensions/: update to JsonConfig, ZeroBanner, ZeroPortal (duration: 01m 24s)
17:17 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: eswiki cirrus (duration: 00m 04s)
16:51 logmsgbot: csteipp Finished scap: Update CentralAuth for Global Rename (duration: 28m 46s)
16:22 logmsgbot: csteipp Started scap: Update CentralAuth for Global Rename
16:17 mark: ulsfo is now offline
16:16 mark: Shutdown NTT BGP sessions on cr2-ulsfo
16:13 mark: Shutdown TiNet BGP sessions on cr1-ulsfo
16:10 mark: Shutdown IXP BGP sessions on cr2-ulsfo
16:10 mark: Shutdown WMF HQ BGP sessions on cr2-ulsfo
16:09 mark: Shutdown WMF HQ BGP sessions on cr1-ulsfo
16:02 logmsgbot: hoo Synchronized php-1.24wmf12/extensions/Wikidata/: Update Wikibase to fix a fatal and various JS things (duration: 00m 14s)
14:13 hashar: Jenkins: bringing back puppet-compiler02.eqiad.wmflabs node online. /tmp get filled when running huge catalog compilations which causes Jenkins to unpool the node :/
13:30 godog: reboot ms-be1005, raid controller confused (?) after disk replacement
12:52 godog: umounted sdg1 on ms-be1005, device disappeared, errors in dmesg
12:35 bblack: enabled amssq47 text frontend cache in pybal for esams
09:39 hashar: Jenkins had a bit of failure earlier due to the massive configuration update of mediawiki-core and mwext jobs. If that fails again the best thing is to stop Jenkins on gallium , wait for it to be killed or force kill -9 the java process then restart Jenkins. Should sort it out
09:30 hashar: restarted Zuul to clear out stalled items in queue
09:12 hashar: Jenkins being slow because the mediawiki-core* jobs history cache has been wiped out while updating their configuration. Jenkins is busy processing the history :(
09:02 hashar: Jenkins killing slave process on lanthanum. Some job is stalled and unrecoverable.
08:53 godog: upgrade ms-be1013/1014/1015 (zone5) to icehouse swift
08:51 hashar: Jenkins migrating jobs to use $ZUUL_URL instead of git://zuul.eqiad.wmnet Preparing to scale out Zuul merger to several nodes
08:19 godog: upgrade ms-be1009/1010/1011 (zone4) to swift icehouse
08:04 hashar: Jenkins: granted matanya the ability to manually trigger builds. Use case: the puppet compiler!
08:02 godog: upgrade ms-be1005/1006/1007 (zone3) to swift icehouse
03:37 mutante: ran puppet on neon - false puppet failure alarms
02:55 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Jul 9 02:54:37 UTC 2014 (duration 54m 36s)
02:26 logmsgbot: LocalisationUpdate completed (1.24wmf12) at 2014-07-09 02:25:33+00:00
02:15 logmsgbot: LocalisationUpdate completed (1.24wmf11) at 2014-07-09 02:14:38+00:00
01:26 mutante: Bugzilla - enabled https://en.wikipedia.org/wiki/HTTP_Strict_Transport_Security
00:50 mutante: restarted gitblit service

July 8

22:50 mutante: radon (phab)- package and kernel upgrades, rebooting
20:22 legoktm: finished running migrateAccount.php --attachbroken --attachmissing (bug 61876)
20:07 legoktm: finished migrateAccount.php --safe, now starting migrateAccount.php --attachbroken
20:05 mutante: restarted apache on ytterbium
19:47 K4-713: updated payments fraud filters again
19:47 legoktm: running migrateAccount.php --safe for accounts only existing on one wiki (bug 39817)
19:27 mutante: this should have fixed all the services behind misc. varnish now getting an actual "A" rating on ssllabs
19:20 mutante: arr, i meant "nginx", not varnish
19:15 mutante: restarting varnish on cp1043/cp1044 (misc cluster)
18:55 cmjohnson1: disconnecting serial cable from psw1-c2-eqiad
18:50 csteipp: patch for bug66608 deployed to wmf11/12
18:50 K4-713: updated fraud filters on payments cluster
18:28 logmsgbot: reedy Synchronized robots-private.txt: (no message) (duration: 00m 14s)
18:27 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 15s)
18:20 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non Wikipedias to 1.24wmf12
15:22 logmsgbot: reedy Purged l10n cache for 1.24wmf10
15:21 logmsgbot: reedy Purged l10n cache for 1.24wmf9
15:15 logmsgbot: anomie Synchronized php-1.24wmf11/extensions/Scribunto/: SWAT: Fix regression in os.date and os.time at module scope gerrit:144559 (duration: 00m 10s)
15:14 logmsgbot: anomie Synchronized php-1.24wmf12/extensions/Scribunto/: SWAT: Fix regression in os.date and os.time at module scope gerrit:144511 (duration: 00m 11s)
15:10 logmsgbot: anomie Synchronized php-1.24wmf11/extensions/UploadWizard/UploadWizard.config.php: SWAT: Flickr API is https-only now gerrit:144584 (duration: 00m 10s)
15:04 logmsgbot: anomie Synchronized php-1.24wmf12/extensions/UploadWizard/UploadWizard.config.php: SWAT: Flickr API is https-only now gerrit:144583 (duration: 00m 10s)
13:34 springle: slow transaction rollback in progress on db1001 librenms. other databases not affected, but librenms writes are timing out
13:32 cmjohnson1: replacing disk disk 6 ms-be1005
13:30 cmjohnson1: replacing disk 4 ms-be1007
12:38 YuviPanda: disregard previous log message, was meant for labs
12:37 YuviPanda: graphite reduced metrics count from 65k to 25k, monitoring io performance
06:57 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: raise db traffic samplers to normal load (duration: 00m 06s)
05:10 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1010, warm up (duration: 00m 06s)
04:51 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1010 for upgrade (duration: 00m 06s)
03:26 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Jul 8 03:25:51 UTC 2014 (duration 25m 50s)
03:00 logmsgbot: LocalisationUpdate completed (1.24wmf12) at 2014-07-08 02:59:33+00:00
02:30 logmsgbot: LocalisationUpdate completed (1.24wmf11) at 2014-07-08 02:29:00+00:00
01:17 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1061, warm up (duration: 00m 06s)
00:57 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1061 for upgrade (duration: 00m 07s)
00:02 ^d: gerrit upgraded to 2.8.1-4-ga1048ce from 2.8.1-2-g724b796, back up. Might be slow for a bit while caches warm.

July 7

23:38 logmsgbot: maxsem Synchronized php-1.24wmf12/extensions/ParserFunctions/: https://gerrit.wikimedia.org/r/#q,144510,n,z (duration: 00m 03s)
23:38 logmsgbot: maxsem Synchronized php-1.24wmf12/includes/StubObject.php: https://gerrit.wikimedia.org/r/#/c/144509/ (duration: 00m 03s)
23:22 logmsgbot: maxsem Synchronized visualeditor-default.dblist: (no message) (duration: 00m 03s)
23:19 logmsgbot: maxsem Synchronized php-1.24wmf12/extensions/GWToolset: (no message) (duration: 00m 03s)
23:18 logmsgbot: maxsem Synchronized php-1.24wmf11/extensions/GWToolset: (no message) (duration: 00m 03s)
23:17 logmsgbot: maxsem Synchronized php-1.24wmf12/extensions/GWToolset: (no message) (duration: 00m 04s)
23:12 logmsgbot: maxsem Synchronized wmf-config/: https://gerrit.wikimedia.org/r/#q,144476,n,z & https://gerrit.wikimedia.org/r/#q,139569,n,z (duration: 00m 05s)
23:04 logmsgbot: maxsem Synchronized wmf-config/: https://gerrit.wikimedia.org/r/#q,144476,n,z & https://gerrit.wikimedia.org/r/#q,139569,n,z (duration: 00m 03s)
22:41 logmsgbot: ori Synchronized wmf-config/mc.php: I8b66e9339: Make app servers connect to nutcracker on port 11212 (duration: 00m 03s)
20:31 logmsgbot: ori Synchronized wmf-config/mc.php: Iea24b092b: Make mw1041 connect to nutcracker on port 11212 (duration: 00m 09s)
20:03 subbu: deployed Parsoid 8ef7b6fe
17:52 legoktm: deleted rows in centralauth's localnames and localuser tables for bug 67548
17:02 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: Cirrus on commons as primary (duration: 00m 04s)
16:34 logmsgbot: aude Finished scap: Update Wikidata to mw1.24-wmf12 branch for group0 wikis (duration: 22m 33s)
16:22 manybubbles: (Cirrus) load tested commons and eswiki over the last hour - both look fine.
16:11 logmsgbot: aude Started scap: Update Wikidata to mw1.24-wmf12 branch for group0 wikis
15:49 bd808: Logstash event volume looks better after restart. Probably related to bug 63490.
15:32 bd808: Restarted logstash on logstash1001 because log volume looked lower than I though it should be.
15:16 cmjohnson1: reseating PEM2 cr1-eqiad
15:08 godog: powercycled ms-be1007, unresponsive on console and remnants of a stack trace
14:49 manybubbles: (Cirrus) Applying cache warmer configuration that went out last Thursday to all wikipedias.
12:11 hashar: Jenkins job builder e1ddd23 fails for us :/ Moving back to parent commit
12:09 hashar: Updated our Jenkins job builder fork 0972985..e1ddd23
09:40 godog: upgrade ms-be1003/1004/1012 (zone2) to swift icehouse
09:16 _joe_: restarting rhenium, pings but no ssh since 2 days, serial console is blank and unresponsive
09:15 godog: upgrade ms-be1002/1008 (zone1) to swift icehouse
02:53 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Jul 7 02:52:10 UTC 2014 (duration 52m 9s)
02:24 logmsgbot: LocalisationUpdate completed (1.24wmf12) at 2014-07-07 02:23:48+00:00
02:13 logmsgbot: LocalisationUpdate completed (1.24wmf11) at 2014-07-07 02:12:44+00:00

July 6

02:50 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Jul 6 02:49:21 UTC 2014 (duration 49m 20s)
02:25 logmsgbot: LocalisationUpdate completed (1.24wmf12) at 2014-07-06 02:24:08+00:00
02:14 logmsgbot: LocalisationUpdate completed (1.24wmf11) at 2014-07-06 02:13:07+00:00

July 5

02:53 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Jul 5 02:52:04 UTC 2014 (duration 52m 3s)
02:27 logmsgbot: LocalisationUpdate completed (1.24wmf12) at 2014-07-05 02:26:08+00:00
02:16 logmsgbot: LocalisationUpdate completed (1.24wmf11) at 2014-07-05 02:15:05+00:00
01:22 springle: ongoing osc_host.sh schema change jobs on terbium. fine to kill in an emergency

July 4

20:05 hoo: Ran sync-common on fenari to update the docs on noc.wikimedia.org
15:40 _joe_: restarting salt-minion, killing io hungry job on fenari running since jun 30, 00 AM
12:28 akosiaris: executed dist-upgrade on virt1000. Keystone configure phase failed in keystone-manage db-sync and hence dpkg configure failed. It was trying to create an already existing index in the database. Dropped the index, ran dpkg --configure -a to recreate the index (and whatever else keystone-manage db_sync does). All is back to normal.
03:29 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Jul 4 03:28:29 UTC 2014 (duration 28m 28s)
03:03 logmsgbot: LocalisationUpdate completed (1.24wmf12) at 2014-07-04 03:02:49+00:00
02:33 logmsgbot: LocalisationUpdate completed (1.24wmf11) at 2014-07-04 02:32:29+00:00
00:28 gwicke: deployed parsoid config change e21a534 to support VE on the OTRS wiki

July 3

23:40 mutante: osmium - libboost-dev : Depends: libboost1.54-dev but it is not going to be installed
23:33 mutante: rhenium (pmacct / flow) Out of memory: Kill process 3123 (pmacctd) score 1 or sacrifice child
23:22 K4-713: updated payments to c5689f385b2f0a7bdc55c5752010e9eb
23:17 logmsgbot: mwalker Synchronized php-1.24wmf12/extensions/VisualEditor/: Updating VisualEditor for 144081 (duration: 00m 12s)
21:07 logmsgbot: oblivian gracefulled all apaches
20:45 mutante: deleted analytics/kraken branch from ops/puppet via gerrit ui, ack'ed by ottomata
20:12 bd808|deploy: Updated scap to ff04431 (restart-nutcracker script)
19:53 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings-labs.php: (no message) (duration: 00m 14s)
19:48 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 14s)
19:48 logmsgbot: reedy Synchronized docroot and w: (no message) (duration: 00m 30s)
19:47 logmsgbot: reedy Synchronized database lists: (no message) (duration: 00m 20s)
19:36 jgage: rebooting analytics1012 for bios change: cpufreq governor
19:27 ottomata: disabling puppet on hadoop related analytics nodes, preparing for reinstall
19:21 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 14s)
19:16 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.24wmf12
19:12 logmsgbot: reedy Synchronized php-1.24wmf11/languages/Language.php: I039547b867b2eab47692dcc018c95b89975bc65d (duration: 00m 40s)
18:49 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikipedia to 1.24wmf11
18:41 logmsgbot: reedy Finished scap: testwiki to 1.24wmf12 and build l10n cache (duration: 30m 47s)
18:18 ottomata: doing rolling restarts of zookeeper servers and kafka brokers to load up new zk timeout changes
18:10 logmsgbot: reedy Started scap: testwiki to 1.24wmf12 and build l10n cache
18:10 logmsgbot: reedy scap aborted: testwiki to 1.24wmf12 and build l10n cache (duration: 27m 26s)
17:53 godog: reloading librenms, semi-broke it with a syslog search (again)
17:46 godog: reloading librenms, semi-broke it with a syslog search
17:42 logmsgbot: reedy Started scap: testwiki to 1.24wmf12 and build l10n cache
16:38 logmsgbot: maxsem Synchronized php-1.24wmf11/extensions/EventLogging/: bug 67420 (duration: 00m 35s)
16:34 paravoid: apt: uploading nutcracker backport for precise
08:07 hashar: Jenkins restarted
08:00 hashar: upgrading Jenkins (minor version bump 1.554.2 -> 1.554.3)
03:39 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Jul 3 03:38:46 UTC 2014 (duration 38m 45s)
03:03 logmsgbot: LocalisationUpdate completed (1.24wmf11) at 2014-07-03 03:02:15+00:00
02:32 logmsgbot: LocalisationUpdate completed (1.24wmf10) at 2014-07-03 02:31:35+00:00

July 2

23:06 logmsgbot: maxsem Synchronized wmf-config/: (no message) (duration: 00m 07s)
22:37 jgage: rebooting analytics1021 to change bios "system profile" from PPW (OS) to PPW (DAPC)
22:19 logmsgbot: ebernhardson Finished scap: (no message) (duration: 36m 25s)
22:16 jgage: rebooting analytics1022 to check bios cpufreq setting
21:43 logmsgbot: ebernhardson Started scap: (no message)
21:42 logmsgbot: ebernhardson Synchronized php-1.24wmf10/extensions/Mantle/: Sync new Mantle extension in 1.24wmf10 (duration: 00m 20s)
21:40 robh: blog updated to newest release, no downtime
21:38 jgage: rebooting analytics1021 to check bios cpufreq setting
20:56 paravoid: pfw1-eqiad: s/mchenry/lead/; all smtp_out rules have [ polonium lead ] as destination-address now
20:49 paravoid: switching non-wikimedia.org MX to polonium/lead (from polonium/mchenry)
20:16 cscott: updated Parsoid to version 6afcb8df
19:08 logmsgbot: yurik Synchronized php-1.24wmf11/extensions/: update to JsonConfig, ZeroBanner, ZeroPortal - bug fix (duration: 01m 40s)
19:00 logmsgbot: yurik Synchronized php-1.24wmf10/extensions/: update to JsonConfig, ZeroBanner, ZeroPortal - bug fix (duration: 01m 15s)
18:42 logmsgbot: yurik Synchronized php-1.24wmf10/extensions/: Reverting previous update to JsonConfig, ZeroBanner, ZeroPortal (duration: 01m 20s)
18:37 logmsgbot: yurik Synchronized php-1.24wmf10/extensions/: Updating JsonConfig, ZeroBanner, ZeroPortal (duration: 01m 55s)
18:33 logmsgbot: yurik Synchronized php-1.24wmf11/extensions/: Updating JsonConfig, ZeroBanner, ZeroPortal (duration: 01m 38s)
18:33 paravoid: reprepro include, trusty-wikimedia (main/universe): nutcracker, libicu 4.8, libzip 0.11, hhvm, {php,hhvm}-wikidiff2, {php,hhvm}-fss, {php,hhvm}-luasandbox, ffmpeg2theora
18:28 yurikR2: yurik ^ was a noop - comment fix
18:28 logmsgbot: yurik Synchronized wmf-config/CommonSettings.php: (no message) (duration: 01m 04s)
18:26 logmsgbot: yurik Synchronized docroot/bits/WikipediaMobileFirefoxOS/: (no message) (duration: 01m 03s)
16:30 mutante: upgrading jenkins to jenkins_1.554.3_all.deb on the apt repo
15:19 manybubbles: done with SWAT for real this time
15:17 logmsgbot: manybubbles Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 28s)
15:16 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 18s)
15:08 manybubbles: *SWAT* complete
15:07 manybubbles: swap complete - logged off of tin
15:04 logmsgbot: manybubbles Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 05s)
15:04 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 06s)
15:00 logmsgbot: manybubbles Synchronized wmf-config: SWAT Remove two permissions from some editors on ruwiki (duration: 00m 07s)
13:10 hashar: Jenkins being busy deleting history files
13:02 hashar: Jenkins: dropping history of puppet related jobs after 90 days. 136992
12:18 akosiaris: upgraded PH5 to 5.3.10-1ubuntu3.12+wmf1 on deployment-apache01 and deployment-apache02 (beta)
12:09 akosiaris: upgraded PHP5 to 5.3.10-1ubuntu3.12+wmf1 on test.wikipedia.org
11:05 logmsgbot: hashar Synchronized wmf-config/InitialiseSettings.php: additional upload domain for Erasmus University 143593 bug 67355 (duration: 00m 06s)
08:00 godog: upgrading ms-be1001 to swift icehouse
07:45 godog: umounted (empty and broken) sdk1 from ms-be3003 and wipe its first sectors, no more remounts
03:00 paravoid: rebooting lead
02:57 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Jul 2 02:56:33 UTC 2014 (duration 56m 32s)
02:27 logmsgbot: LocalisationUpdate completed (1.24wmf11) at 2014-07-02 02:26:24+00:00
02:15 logmsgbot: LocalisationUpdate completed (1.24wmf10) at 2014-07-02 02:14:53+00:00

July 1

23:49 K4-713: updated payments cluster to c5689f385b2f0a7
23:43 robh: any francium errors can be ignored, as the software doesn't fully deploy from puppet and its not in service
23:36 logmsgbot: maxsem Synchronized wmf-config/FeaturedFeedsWMF.php: https://gerrit.wikimedia.org/r/#/c/136316/ now for realz (duration: 00m 04s)
23:34 logmsgbot: maxsem Synchronized wmf-config/FeaturedFeedsWMF.php: https://gerrit.wikimedia.org/r/#/c/136316/ (duration: 00m 04s)
23:09 logmsgbot: maxsem Synchronized php-1.24wmf11/extensions/CentralAuth/: https://gerrit.wikimedia.org/r/#/c/143473/ (duration: 00m 05s)
23:05 logmsgbot: maxsem Synchronized php-1.24wmf11/resources/: https://gerrit.wikimedia.org/r/#/c/142975/ (duration: 00m 05s)
23:04 logmsgbot: maxsem Synchronized php-1.24wmf10/resources/: https://gerrit.wikimedia.org/r/#/c/142975/ (duration: 00m 19s)
21:39 hoo: Set email for re-renamed dewiki account "Kolimak". Email and password got lost during a screwed rename.
20:36 logmsgbot: reedy Synchronized php-1.24wmf11/extensions/WikimediaMessages: bug 67387 (duration: 00m 15s)
20:31 mutante: restarting apache on mw1217
20:27 manybubbles: Adding cache warmers to all Cirrus indexes for group1 wikis with more then one shard except commons (commons is busy, it'll have to wait:)
19:53 logmsgbot: aude Synchronized wmf-config/Wikibase.php: adjust property suggester setting for wikidata (duration: 00m 11s)
19:14 logmsgbot: ori Synchronized php-1.24wmf10/resources/src/jquery.ui-themes/vector/jquery.ui.core.css: Ib09928248: vector/jquery.ui.core.css: Update rule for .ui-helper-hidden-accessible (bug 67243) (duration: 00m 05s)
19:14 logmsgbot: ori Synchronized php-1.24wmf11/resources/src/jquery.ui-themes/vector/jquery.ui.core.css: Ib09928248: vector/jquery.ui.core.css: Update rule for .ui-helper-hidden-accessible (bug 67243) (duration: 00m 06s)
18:42 andrewbogott: adding virt1008 to labs compute pool
18:41 andrewbogott: switching puppet canary from virt1008 to virt1009
18:38 logmsgbot: aude Synchronized wmf-config/InitialiseSettings.php: Enable property suggester on Wikidata (duration: 00m 10s)
18:38 logmsgbot: aude Synchronized wmf-config/Wikibase.php: (no message) (duration: 00m 15s)
18:30 logmsgbot: aaron Synchronized wmf-config/PrivateSettings.php: removed obsolete swift tampa config (duration: 00m 07s)
18:15 logmsgbot: reedy Synchronized docroot and w: (no message) (duration: 00m 18s)
18:14 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non wikipedias to 1.24wmf11
17:54 logmsgbot: demon Synchronized php-1.24wmf10/extensions/Elastica: Updating to master, fixes fatal error (duration: 00m 07s)
17:45 manybubbles: rebuilding cirrus index for commons to put it into fewer shards - it should be faster this way
17:24 mutante: antimony: git.wikimedia.org]: Ensure set to :present but file type is link so no content will be synced
17:24 logmsgbot: hoo Synchronized wmf-config/: Typos typos typso (duration: 00m 08s)
17:21 mutante: restarting apache on antimony
17:21 mutante: fixing svn.wikimedia.org apache site manually
17:08 springle: restarted mysqld on db1046 m2 slave
17:03 logmsgbot: demon Synchronized cirrus.dblist: Move remaining pool 4 lsearchd wikis (except commons) to Cirrus (duration: 00m 07s)
15:09 manybubbles: done with SWAT deploy
15:06 logmsgbot: manybubbles Synchronized php-1.24wmf11/extensions/CirrusSearch/: SWAT code to set up cache warmers (duration: 00m 05s)
15:04 logmsgbot: manybubbles Synchronized wmf-config: SWAT - cirrus settings - cache warmers and shard counts (duration: 00m 06s)
15:04 ottomata: temporarily disabling puppet on hafnium to test an eventlogging alert
14:27 hashar: Stopping Jenkins it has some corrupted threads
13:16 Jeff_Green: dist-upgrade and reboot tellurium
13:08 Jeff_Green: dist-upgrade and reboot boron
12:23 logmsgbot: reedy Synchronized multiversion/: (no message) (duration: 00m 23s)
12:22 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 18s)
12:16 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings-labs.php: (no message) (duration: 00m 20s)
12:03 logmsgbot: reedy Synchronized wmf-config/interwiki.cdb: (no message) (duration: 00m 13s)
12:00 Reedy: Manually created Echo tables on extension1
11:55 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: touch (duration: 00m 13s)
11:55 logmsgbot: reedy Synchronized database lists: (no message) (duration: 00m 28s)
11:53 Reedy: Manually created wikimania2015wiki database on 10.64.16.18
11:48 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: (no message)
11:48 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: touch (duration: 00m 14s)
11:47 logmsgbot: reedy Synchronized database lists: (no message) (duration: 00m 16s)
10:49 _joe_: nginx restarted on all ulsfo hosts as well, we should be PFS-enabled now
10:38 _joe_: esams restart finished, moving to ulsfo
10:30 _joe_: all eqiad SSL terminators are now PFS enabled. Moving to rolling restarting esams
10:09 _joe_: restarting nginx on ssl100* servers in sequence, to activate PFS
08:47 godog: ms-be3003 sdk1 disk to 0 weight
07:22 legoktm: finished running checkLocalNames.php and checkLocalUser.php for some wikivoyages to clean up bug 66535
07:16 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1067 (duration: 00m 12s)
07:06 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1067 during schema changes (duration: 00m 06s)
06:58 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1063 (duration: 00m 06s)
06:40 legoktm: starting to run checkLocalNames.php and checkLocalUser.php for some wikivoyages to clean up bug 66535
06:34 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1063 during schema changes (duration: 00m 06s)
06:31 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1060 (duration: 00m 06s)
06:29 legoktm: ran fixInvalidStudent.php --wiki=enwiki --courseId=359 for bug 66624
06:13 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1060 during schema changes (duration: 00m 07s)
02:51 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Jul 1 02:50:05 UTC 2014 (duration 50m 4s)
02:24 logmsgbot: LocalisationUpdate completed (1.24wmf11) at 2014-07-01 02:23:49+00:00
02:15 logmsgbot: LocalisationUpdate completed (1.24wmf10) at 2014-07-01 02:14:11+00:00

June 30

23:03 awight: update tools from e894f1f77674b6b101ae0e1644e363ca52e319d9 to d605bdc2aaaef2d4b296a4d9567ed2831db86756
23:02 logmsgbot: ori Synchronized wmf-config: Iba41a37a1: Keep thumbnail guessing enabled (duration: 00m 05s)
22:14 mutante: re-enabled puppet on caesium
21:43 mutante: disabling puppet on caesium
21:35 Reedy: running mwscript updateSpecialPages.php --wiki=enwiki --only=Mostlinkedtemplates --override on terbium
21:25 mutante: fixing releases.wikimedia.org Apache site, delete sites-enabled file broken by puppet, add symlink, graceful
21:00 subbu: deployed parsoid 0b365d516
19:44 _joe_: restarting pybal on lvs1005
19:16 awight: updated payments from a04e536b6923f2228bb7f5fbf2caeed64a888742 to 2b6c527617dcde154cc298dd9697c9d57c9f3620
18:41 awight: updated payments from a8138fefd940ba41812e5c07ca6bc74b63cb9bcf to a04e536b6923f2228bb7f5fbf2caeed64a888742
17:38 manybubbles: Cirrus reindex update! all wikipedias finished their in place reindex except ruwiki - that one is running now. all group1 wikis finished their from mediawiki reindex except commons and mgwiktionary which are running now. started from mediawiki reindex of all wikipedias exception for enwiki, itwiki, and cawiki which are already long done.
17:12 logmsgbot: manybubbles Synchronized cirrus.dblist: Enabled CirrusSearch as the default search backend on 30 more wikis - take five (duration: 00m 04s)
17:08 logmsgbot: manybubbles Synchronized wmf-config/: Enable CirrusSearch as the default search backend on 30 more wikis - take four (duration: 00m 04s)
17:08 logmsgbot: manybubbles Synchronized wmf-config/: Enable CirrusSearch as the default search backend on 30 more wikis - for real for real (duration: 00m 04s)
17:07 logmsgbot: manybubbles Synchronized wmf-config/: Enable CirrusSearch as the default search backend on 30 more wikis - for real (duration: 00m 04s)
17:05 logmsgbot: manybubbles Synchronized wmf-config/: Enable CirrusSearch as the default search backend on 30 more wikis (duration: 00m 05s)
15:43 logmsgbot: manybubbles Synchronized php-1.24wmf11/extensions/Wikidata/: (no message) (duration: 00m 09s)
15:35 logmsgbot: manybubbles Synchronized php-1.24wmf11/extensions/VisualEditor/: SWAT Correctly VisualEditor - update full size in MediaSizeWidget (duration: 00m 07s)
15:26 logmsgbot: manybubbles Synchronized wmf-config/: SWAT - disable local uploads on Malay Wiktionary (duration: 00m 04s)
15:23 logmsgbot: manybubbles Synchronized wmf-config/: SWAT - remove completed mediaviewer surveys (duration: 00m 04s)
15:19 _joe_: restarted profiler-to-carbon, stuck since _9_ days, will see that my patch gets deployed.
15:15 logmsgbot: manybubbles Synchronized php-1.24wmf10/extensions/ProofreadPage: SWAT - fix ProofreadPage number of pages (duration: 00m 09s)
14:48 godog: installed new swift ring on esams, decrease ms-be3003/sdk1 weight
14:41 hoo: Cleared out a watchlist with 126652 entries on warwiki to resolve https://bugzilla.wikimedia.org/show_bug.cgi?id=67123
13:31 godog: upgrade ms-fe300[12] to swift icehouse
10:20 hashar: restarting zuul after a puppet change for /etc/zuul/zuul.conf
07:53 godog: upgrading ms-be300[2-4] to swift icehouse
02:49 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Jun 30 02:48:28 UTC 2014 (duration 48m 27s)
02:28 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1061, warm up (duration: 00m 07s)
02:25 logmsgbot: LocalisationUpdate completed (1.24wmf11) at 2014-06-30 02:23:56+00:00
02:15 logmsgbot: LocalisationUpdate completed (1.24wmf10) at 2014-06-30 02:14:23+00:00
01:41 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1061 during schema changes (duration: 00m 07s)

June 29

22:26 hoo: Manually cleared a watchlist on shwikt with 819846 entries, see https://bugzilla.wikimedia.org/show_bug.cgi?id=67123#c7
22:10 hoo: Manually cleared a watchlist with 289436 entries, see https://bugzilla.wikimedia.org/show_bug.cgi?id=67123#c5
16:44 hoo: Jenkins/ Zuul not reacting for at least half an hour now
16:43 awight: update tools from 3a35482ab1fede2ccfcc49a64ec661b0cb013b81 to e894f1f77674b6b101ae0e1644e363ca52e319d9
16:09 awight: updated payments from 6d74002f2634f41f7038daa7357ff6de55ee4880 to a8138fefd940ba41812e5c07ca6bc74b63cb9bcf
02:45 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Jun 29 02:44:35 UTC 2014 (duration 44m 34s)
02:22 logmsgbot: LocalisationUpdate completed (1.24wmf11) at 2014-06-29 02:21:01+00:00

June 28

17:16 ori: restarted lucene on search1016 per _joe_
12:58 manybubbles: Cirrus reindex status: enwiki has almost finished its in place reindex, alphabetical wikipedias are at frwiki, all group1 wikis have finished their in place reindex. all group1 wikis are running from mediawiki reindex. itwiki and cawiki both finished both the in place and from mediawik reindex. Haven't started alphabetical from mediawiki reindex yet for wikipedias. that is the only
10:40 _joe_: restarting lucene on search1015, stuck. again.
02:47 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Jun 28 02:46:49 UTC 2014 (duration 46m 48s)
02:25 logmsgbot: LocalisationUpdate completed (1.24wmf11) at 2014-06-28 02:24:12+00:00
02:16 logmsgbot: LocalisationUpdate completed (1.24wmf10) at 2014-06-28 02:15:38+00:00

June 27

23:15 awight: deploymed payments config
22:57 logmsgbot: csteipp Synchronized php-1.24wmf11/extensions/OAuth/frontend/specialpages/SpecialMWOAuth.php: Fix OAuth Logins for wmf11 (duration: 00m 18s)
20:57 awight: updated crm from 340c43a15a84a9392ad5ef9fc2782243ff140deb to 17439326ca4488ece843a263fc14859b38cff0e9
19:33 hashar: puppet-compiler: removed modules/varnish at root@puppet-compiler02:/opt/wmf/software/compare-puppet-catalogs/external/puppet and resetted repo.
19:07 awight: update crm from e2fe03a9cd51e30206d9a1114d62dfbd6960816b to 340c43a15a84a9392ad5ef9fc2782243ff140deb
18:57 logmsgbot: aaron Synchronized wmf-config/PoolCounterSettings-eqiad.php: Pre-set FileRenderExpensive config
18:34 bblack: updated puppet repo on virt0
18:11 mutante: osmium - hhvm : Depends: libdouble-conversion1 but it is not going to be installed
16:49 bblack: updated carbon repo varnish pkg to 3.0.5plus~x-wm6
14:18 hashar: Updated our Jenkins Job Builder fork: e9db73d..0972985
03:31 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Jun 27 03:30:00 UTC 2014 (duration 29m 59s)
03:06 logmsgbot: LocalisationUpdate completed (1.24wmf11) at 2014-06-27 03:05:31+00:00
02:36 logmsgbot: LocalisationUpdate completed (1.24wmf10) at 2014-06-27 02:35:20+00:00

June 26

23:32 manybubbles: Cirrus rebuild progress - started large/high cirrus visibility wikis in group2 - enwiki, cawiki, and itwiki.
23:31 manybubbles: Cirrus rebuild progress - alphabetical wikis in group2 are 2/3 of the way done with reindex - from mediawiki rebuild is maybe 20% done there
23:31 manybubbles: Cirrus rebuild progress - big wikis in group1 are finished with in place reindex and well into from mediawiki rebuild.
23:27 ori: Previous scap included I2cfcfaf06 as well
23:23 logmsgbot: ori Finished scap: CirrusSearch updates: Iefe340729, Ie12418e54, Ie21fb352 (duration: 04m 59s)
23:18 logmsgbot: ori Started scap: CirrusSearch updates: Iefe340729, Ie12418e54, Ie21fb352
23:07 logmsgbot: ori Synchronized wmf-config/InitialiseSettings.php: Ie96265c4f: Add an Erasmus University domain to whitelist (duration: 00m 05s)
23:07 logmsgbot: ori updated /a/common to Ie96265c4f: Add an Erasmus University domain to whitelist
21:51 hashar: Zuul/Jenkins back up and operational.
21:43 hashar: hardkilled Zuul :-( 6 events lost.
21:38 hashar: restarting Zuul it has a bunch of stalled changes
21:32 bblack: enabled cp301[78] frontends in pybal
21:27 hashar: restarting Jenkins
21:26 hashar: Zuul/Jenkins stalled apparently
20:59 logmsgbot: aude Synchronized wmf-config/InitialiseSettings.php: Enable property suggester on testwikidata (duration: 00m 07s)
20:58 logmsgbot: reedy Synchronized php-1.24wmf11/extensions/WikimediaMessages/: (no message) (duration: 00m 15s)
20:57 logmsgbot: reedy Synchronized php-1.24wmf11/extensions/OAuth/: (no message) (duration: 00m 15s)
20:48 logmsgbot: aude Finished scap: Update Wikidata, for enabling property suggester on testwikidata (duration: 31m 57s)
20:16 logmsgbot: aude Started scap: Update Wikidata, for enabling property suggester on testwikidata
19:18 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 14s)
19:14 logmsgbot: reedy Synchronized php-1.24wmf11/extensions/OAuth/: (no message) (duration: 00m 14s)
19:06 RobH: blog is back online after a number of reboots due to raid rebuild issues
18:20 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.24wmf11
18:16 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikipedias to 1.24wmf10
18:15 logmsgbot: reedy Synchronized php-1.24wmf10/includes/api/ApiQueryRecentChanges.php: Id9c316733896a27ce3f6c3e0e5efdf62f7d1ff1b (duration: 00m 14s)
18:08 ottomata: starting new elasticsearch nodes 1017,1018,1019
18:04 RobH: aware of holmium issue (old varnish), in process of repair, blog is down
17:05 logmsgbot: reedy Synchronized php-1.24wmf11/resources/Resources.php: I1237909d7e058137d55e5de9fa4d64fe1f7f9472 (duration: 00m 14s)
17:04 logmsgbot: reedy Finished scap: l10nupdate for 1.24wmf11 for Skins I9395b0e1983122b12bedf003d6398da5ddfd5651 (duration: 16m 35s)
16:48 logmsgbot: reedy Started scap: l10nupdate for 1.24wmf11 for Skins I9395b0e1983122b12bedf003d6398da5ddfd5651
16:46 logmsgbot: reedy Purged l10n cache for 1.24wmf4
16:45 logmsgbot: reedy Purged l10n cache for 1.24wmf5
16:45 logmsgbot: reedy Purged l10n cache for 1.24wmf6
16:44 logmsgbot: reedy Purged l10n cache for 1.24wmf7
16:44 logmsgbot: reedy Purged l10n cache for 1.24wmf8
16:32 logmsgbot: reedy Finished scap: testwiki to 1.24wmf11 and build l10n cache (duration: 27m 20s)
16:05 logmsgbot: reedy Started scap: testwiki to 1.24wmf11 and build l10n cache
16:01 logmsgbot: reedy scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki="testwiki" --list-file="/a/common/wmf-config/extension-list" --output="/tmp/tmp.XetXfk5RPi" ' returned non-zero exit status 1 (duration: 00m 18s)
16:00 logmsgbot: reedy Started scap: testwiki to 1.24wmf11 and build l10n cache
15:56 logmsgbot: reedy scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki="testwiki" --list-file="/a/common/wmf-config/extension-list" --output="/tmp/tmp.9SaYNRzegr" ' returned non-zero exit status 1 (duration: 00m 24s)
15:55 logmsgbot: reedy Started scap: testwiki to 1.24wmf11 and build l10n cache
15:55 logmsgbot: reedy scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki="testwiki" --list-file="/a/common/wmf-config/extension-list" --output="/tmp/tmp.EjEynr9oww" ' returned non-zero exit status 1 (duration: 00m 55s)
15:54 logmsgbot: reedy Started scap: testwiki to 1.24wmf11 and build l10n cache
15:24 cmjohnson1: shutting down holmium to replace disk
14:35 bblack: restarted nova-network on labnet1001
14:26 hashar: updated zuul cloner in git repo and deployed zuul ( tag is wmf-deploy-20140626-1 )
13:54 godog: remounted (broken) sdk1 on ms-be3003
13:32 cmjohnson1: powering down dataset1001 -relocating to 10G rack
13:26 logmsgbot: reedy Synchronized wmf-config/: https://gerrit.wikimedia.org/r/#/c/142142/ No-op for labs (duration: 00m 16s)
12:55 hashar: Jenkins: updates jobs for extensions (phpunit and qunit) to use the mw-run-update-script.sh instead of update.php . That runs update.php twice, the first time logging sql to a file that can be archived. 141851
12:48 mark: Deactivated BGP session to AS13030
11:01 hashar: Replacing operations-puppet-validate job with operations-puppet-pplint-HEAD which is faster and can run concurrently on multiple boxes. 142223
10:52 godog: stopping swift on ms-be3003
10:12 godog: upgrading ms-be3001 to swift icehouse
06:26 springle: ran operations/software maintain-replicas.pl and fedtables.pl on labsdbs for bug 59683
05:54 Tim: on mw1014: reformatted the /tmp partition
05:50 Tim: on mw1014: stopped job runner due to bad /tmp
04:44 ori: mw1014 is sad, has filesystem issues: "Attempt to read block from filesystem resulted in short read while trying to open /tmp". Puppet can't run. Should be depooled.
03:34 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Jun 26 03:33:19 UTC 2014 (duration 33m 18s)
03:02 logmsgbot: LocalisationUpdate completed (1.24wmf10) at 2014-06-26 03:01:43+00:00
02:32 logmsgbot: LocalisationUpdate completed (1.24wmf9) at 2014-06-26 02:31:50+00:00

June 25

23:43 awight: updated crm from f3389daa94e9ad924175bdf0d5bc09c4a26aeb8c to e2fe03a9cd51e30206d9a1114d62dfbd6960816b
23:27 logmsgbot: catrope Finished scap: Updating Wikidata and TimedMediaHandler (duration: 04m 24s)
23:23 logmsgbot: catrope Started scap: Updating Wikidata and TimedMediaHandler
21:22 hashar: puppet fixed on gallium / lanthanum . It was missing a group definition. All fixed! Thanks Chase.
20:53 hashar: puppet broken on gallium.wikimedia.org and lanthanum.eqiad.wmnet . That is being looked at.
20:34 subbu: deployed parsoid 4ef9d6be
19:38 manybubbles: restarted Cirrus scripts after incident - the index rebuilds had to be completely restarted - sanity checking was simply paused
18:54 logmsgbot: yurik Synchronized wmf-config/PrivateSettings.php: Removed obsolete ZRMA user/pswd (duration: 01m 06s)
18:46 logmsgbot: yurik Finished scap: Removing ZeroRatedMobileAccess ext settings, depl latest JsonConfig/ZeroBanner/Portal (duration: 29m 09s)
18:17 logmsgbot: yurik Started scap: Removing ZeroRatedMobileAccess ext settings, depl latest JsonConfig/ZeroBanner/Portal
17:41 logmsgbot: demon Synchronized wmf-config/: Cirrus back on for wikis that had it before. Back to square 1 (duration: 00m 04s)
17:29 mwalker: updating fundraising tools from 5f3a7316b636c0723ce3fa353186d4041b662872 to cdc4b73bd59d27c8d386b6df629b1c574cfed85f
17:06 manybubbles: success!
17:06 logmsgbot: manybubbles Synchronized wmf-config/: try to fix cirrus (duration: 00m 04s)
16:51 andrewbogott: restarted apache on palladium -- _that_ helped
16:49 andrewbogott: it didn't help
16:49 andrewbogott: restarting puppetmaster on palladium
16:42 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: Disable Cirrus everywhere but testwiki (duration: 00m 04s)
16:23 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: Roll back previous Cirrus deploy (duration: 00m 05s)
16:23 logmsgbot: demon Synchronized wmf-config/CommonSettings.php: Roll back previous Cirrus deploy (duration: 00m 04s)
16:16 logmsgbot: demon Synchronized wmf-config/CommonSettings.php: I4c54357a: most remaining wikis getting Cirrus as primary (duration: 00m 04s)
16:16 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: I4c54357a: most remaining wikis getting Cirrus as primary (duration: 00m 04s)
15:27 logmsgbot: manybubbles Synchronized php-1.24wmf9/extensions/Wikidata/: SWAT - fix rtl issue (duration: 00m 10s)
15:23 ottomata: reinstalling elastic1017,1018,1019
15:20 logmsgbot: manybubbles Synchronized php-1.24wmf10/extensions/Wikidata/: SWAT - fix rtl issue (duration: 00m 12s)
14:10 Krinkle: Upgrade npm from v1.4.5 to v1.4.16 on integration-slave1001 and integration-slave1002
14:10 Krinkle: Upgraded npm from v1.4.13 to v1.4.16 on integration-slave1003 to fix https://github.com/npm/npm/issues/5472 and repooling
13:30 Krinkle: Depooling integration-slave1003 as almost every other -npm build on this node fails due to corrupted ~/.npm cache
12:52 manybubbles: cirrus rebuild update: starting from mediawiki reindex step for all alphabetical wikis that have finished so far
12:48 manybubbles: cirrus rebuild update: started rebuilding group1's indexes yesterday. commons and wikidata finished their in place pass and started their from mediawiki pass. The remaining wikis are running their in place pass in alphabetical order and currently on frwiktionary.
12:25 hashar: Upgraded Zuul 9839edb..b7fc126 Brings patchset 20 of Zuul cloner ( https://review.openstack.org/#/c/70373/ )
12:02 akosiaris: upgraded etherpad.wikimedia.org to etherpad-lite 1.4.0
11:12 paravoid: switching inbound email for wikimedia.org to polonium/mchenry
10:35 _joe_: restarted lucene on search1016 as it was stuck there as well, once search1015 is up and running
10:06 _joe_: restarted lucene on search1015, it was stuck
07:37 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: incremental LB bump on db1009 and db1021 traffic samplers (duration: 00m 07s)
06:31 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1021 with traffic sampling (duration: 00m 09s)
06:01 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1021, db1049 to normal load (duration: 00m 07s)
05:05 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1049, warm up (duration: 00m 08s)
02:50 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Jun 25 02:48:53 UTC 2014 (duration 48m 52s)
02:39 springle: xtrabackup clone db1005 to db1049
02:27 logmsgbot: LocalisationUpdate completed (1.24wmf10) at 2014-06-25 02:25:57+00:00
02:19 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1049 (duration: 00m 11s)
02:14 logmsgbot: LocalisationUpdate completed (1.24wmf9) at 2014-06-25 02:13:27+00:00
00:57 chasemp: added dns for wikimania 2015 (gerrit 140186)

June 24

23:28 ori: apache-graceful-all was for Ifc9596cc7
23:28 logmsgbot: ori gracefulled all apaches
23:12 logmsgbot: maxsem Synchronized visualeditor.dblist: https://gerrit.wikimedia.org/r/141702 (duration: 00m 03s)
23:11 logmsgbot: maxsem Synchronized php-1.24wmf10/extensions/MultimediaViewer: (no message) (duration: 00m 05s)
23:02 ori: apache graceful done by me for I543efda24, I29b34689e, and I1c269433e
23:00 logmsgbot: root gracefulled all apaches
20:53 hashar: Jenkins / Zuul deploying experimental pipeline 141827
20:29 RoanKattouw: Restarting Apache on mw1220, getting lots of "Unable to allocate memory for pool" errors
20:29 ottomata: rebooting analytics1021
20:25 ottomata: reinitializing varnish topics with replication factor of 3
20:02 hashar: updated our Jenkins Job Builder copy 416ee7d..e9db73d
19:58 hashar: Upgraded Zuul on gallium.wikimedia.org to install the zuul-cloner of doom. 4f9fd51..9839edb Tagged wmf-deploy-20140624-1 in our repo.
19:39 manybubbles: rebuilding search index for group1 wikis after upgrade today
18:27 logmsgbot: reedy Synchronized docroot and w: (no message) (duration: 00m 14s)
18:25 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non Wikipedias to 1.24wmf10
17:52 logmsgbot: manybubbles Synchronized wmf-config: Drop Cirrus indexes to five shards on rebuild and switch all wikis to new highlighter (duration: 00m 04s)
17:44 logmsgbot: aaron Synchronized wmf-config/InitialiseSettings.php: Maintenance reports limit incremental increase (duration: 00m 08s)
17:37 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: reduce db1021 load (duration: 00m 10s)
17:06 akosiaris: restarted hadoop yarn on analytics1013
15:36 bblack: VCL compilation is now in-sync everywhere but bits caches...
15:21 logmsgbot: manybubbles Synchronized php-1.24wmf9/extensions/CirrusSearch/: SWAT Stop Cirrus from breaking RandomRootPage (duration: 00m 06s)
15:15 logmsgbot: manybubbles Synchronized php-1.24wmf10/extensions/CirrusSearch/: SWAT Stop Cirrus from breaking RandomRootPage (duration: 00m 04s)
15:06 logmsgbot: manybubbles Synchronized wmf-config/: SWAT - visual editor config changes and retire some beta features (duration: 00m 04s)
15:05 logmsgbot: manybubbles Synchronized visualeditor-default.dblist: SWAT - Enable VisualEditor by default on Wikimania 2014 wiki (duration: 00m 04s)
15:05 logmsgbot: manybubbles Synchronized visualeditor.dblist: SWAT - Enable VisualEditor by default on Wikimania 2014 wiki (duration: 00m 06s)
15:03 logmsgbot: manybubbles Synchronized php-1.24wmf10/includes/config/GlobalVarConfig.php: SWAT - GlobalVarConfig should not throw exceptions for null-valued config settings (duration: 00m 05s)
14:53 logmsgbot: hoo Synchronized wmf-config/CommonSettings.php: Enable Wikibase property suggester on beta (duration: 00m 07s)
14:15 hashar: Jenkins set SMTP server to wiki-mail.wikimedia.org smtp.pmtpa.wmnet got deleted
14:07 hashar: Jenkins is back
13:59 Krinkle: Build logs in Jenkins incorrectly render ansi color codes since it was upgraded to 0.4.0. Downgrading to 0.3.1 and restarting Jenkins.
09:55 godog: removing old salt master cache on palladium, moved yesterday out of the way
06:59 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1049 (duration: 00m 08s)
06:23 Nemo_bis: FYI no gerrit mail since yesterday 15 UTC, https://bugzilla.wikimedia.org/show_bug.cgi?id=67018
02:48 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Jun 24 02:47:14 UTC 2014 (duration 47m 13s)
02:26 logmsgbot: LocalisationUpdate completed (1.24wmf10) at 2014-06-24 02:25:43+00:00
02:14 logmsgbot: LocalisationUpdate completed (1.24wmf9) at 2014-06-24 02:13:38+00:00

June 23

23:12 logmsgbot: maxsem Synchronized php-1.24wmf9/extensions/MobileFrontend/: https://gerrit.wikimedia.org/r/#/c/141102/ (duration: 00m 06s)
23:12 logmsgbot: maxsem Synchronized php-1.24wmf10/extensions/MobileFrontend/: https://gerrit.wikimedia.org/r/#/c/141102/ (duration: 00m 05s)
23:03 logmsgbot: maxsem Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/140897/ (duration: 00m 04s)
20:05 subbu: deployed parsoid 392435a2 (deploy sha db94f88c)
19:22 hashar: gallium / zuul : deleting /var/lib/zuul/git old Zuul repositories. They have been migrated to /srv/ssd/zuul/git/ ages ago
19:20 jgage: ms-be3003 full root partition fixed, swift had written to /srv/swift-storage/sdk1 onto root due to umounted sdk1
17:38 bblack: lvs1005:eth3 was negotiated to 100mbps (???) - disable -> enable on switch fixed it
17:36 godog: restarted salt-master on palladium, suspected job cleanup stuck
17:04 bd808: Fixed dangling symlink for /etc/apache2/sites-enabled/logstash.wikimedia.org on logstash1001 by deleting symlink and forcing puppet run
16:49 godog: added mw1149-52 back to pybal apache
16:33 paravoid: switched inbound mail for all non-wikimedia.org domains from mchenry/sodium to polonium/mchenry (~16:00 + <= 1h TTL UTC)
15:13 logmsgbot: anomie Synchronized wmf-config/InitialiseSettings.php: SWAT: Add a Library of Congress domain to wgCopyUploadsDomains gerrit:141308 (duration: 00m 14s)
15:11 logmsgbot: anomie Synchronized wmf-config/InitialiseSettings.php: SWAT: Adjust group rights on ruwiki gerrit:140910 (duration: 00m 14s)
15:10 logmsgbot: anomie Synchronized php-1.24wmf9/includes/api/ApiExpandTemplates.php: SWAT: Fix fatal in API action=expandtemplates with Scribunto gerrit:141416 (duration: 00m 15s)
15:04 logmsgbot: anomie Synchronized php-1.24wmf10/includes/api/ApiExpandTemplates.php: SWAT: Fix fatal in API action=expandtemplates with Scribunto gerrit:141417 (duration: 00m 14s)
14:55 andrewbogott: reenabling puppet on labstore1001, hoping it doesn't break labs
14:38 hashar: Further upgraded Zuul up to upstream b8c24ce + our local hacks. Git tag is wmf-deploy-20140623-4
14:14 hashar: upgraded Zuul by one commit (that introduces swift supports though disabled it on our setup via a custom hack)
13:20 paravoid: switching outbound email to polonium
12:17 manybubbles: rebuilding Cirrus index on group0 wikis to pick up changes like results boosting from categories and wikitext search
10:37 godog: powering down maerlant, decom-med
10:05 godog: hardreset maerlant, stuck on console and no ssh
09:40 paravoid: killing sodium's lighttpd compress cache
07:21 _joe_: powercycled cp4018, stuck with a blank console
02:59 springle: moving lighttpd compressed archives on sodium off / to regain inodes
02:46 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Jun 23 02:45:24 UTC 2014 (duration 45m 23s)
02:26 logmsgbot: LocalisationUpdate completed (1.24wmf10) at 2014-06-23 02:25:53+00:00
02:14 logmsgbot: LocalisationUpdate completed (1.24wmf9) at 2014-06-23 02:13:53+00:00
00:38 legoktm: mail is stuck, lots of mails queued in exim

June 22

22:25 _joe_: restarted apache on strontium, passenger crashed (again).
21:06 logmsgbot: hoo Synchronized wmf-config/InitialiseSettings-labs.php: For cluster consistency... (duration: 00m 08s)
19:24 godog: silenced LVS healthcheck on rendering.svc until 23:23 UTC
02:42 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Jun 22 02:41:30 UTC 2014 (duration 41m 29s)
02:24 logmsgbot: LocalisationUpdate completed (1.24wmf10) at 2014-06-22 02:23:50+00:00
02:13 logmsgbot: LocalisationUpdate completed (1.24wmf9) at 2014-06-22 02:12:50+00:00

June 21

16:12 _joe_: restarted ms-be1012, see http://paste.debian.net/106247/ for console output
02:46 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Jun 21 02:45:17 UTC 2014 (duration 45m 16s)
02:30 logmsgbot: LocalisationUpdate completed (1.24wmf10) at 2014-06-21 02:28:59+00:00
02:18 logmsgbot: LocalisationUpdate completed (1.24wmf9) at 2014-06-21 02:17:18+00:00

June 20

22:58 logmsgbot: demon Synchronized wmf-config/CommonSettings.php: Load nostalgia from skins rather than extensions when it exists (duration: 00m 04s)
20:23 logmsgbot: demon Synchronized wmf-config/CommonSettings.php: wfGetIP removal, code cleanup (duration: 00m 04s)
20:22 logmsgbot: demon Synchronized wmf-config/throttle.php: wfGetIP removal, code cleanup (duration: 00m 05s)
17:11 godog: expanded palladium's root to avoid filling up, suspected salt-master (RT #7721)
16:53 bd808: Ran /usr/local/bin/sync-common on fenari to verify fix for bug 66844. It works!
15:16 logmsgbot: reedy Synchronized wmf-config/extension-list-labs: (no message) (duration: 00m 16s)
11:00 _joe_: restarted apache on palladium, passenger was dead and filling error logs
03:35 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Jun 20 03:34:06 UTC 2014 (duration 34m 5s)
03:19 logmsgbot: LocalisationUpdate completed (1.24wmf10) at 2014-06-20 03:18:36+00:00
02:35 logmsgbot: LocalisationUpdate completed (1.24wmf9) at 2014-06-20 02:34:14+00:00
00:06 MaxSem: Running clearMessageBlobs.php

June 19

23:52 MaxSem: that was a touch
23:51 logmsgbot: maxsem Synchronized php-1.24wmf9/extensions/MultimediaViewer/: (no message) (duration: 00m 04s)
23:38 logmsgbot: maxsem Finished scap: Mark Traceur made me do it! (duration: 15m 14s)
23:23 logmsgbot: maxsem Started scap: Mark Traceur made me do it!
23:20 logmsgbot: maxsem Synchronized php-1.24wmf10/extensions/MobileFrontend/: (no message) (duration: 00m 03s)
23:19 logmsgbot: maxsem Synchronized php-1.24wmf9/extensions/MobileFrontend/: (no message) (duration: 00m 04s)
23:18 logmsgbot: maxsem Synchronized php-1.24wmf10/extensions/CirrusSearch/: (no message) (duration: 00m 03s)
23:18 logmsgbot: maxsem Synchronized php-1.24wmf10/extensions/VisualEditor/: (no message) (duration: 00m 04s)
23:16 logmsgbot: maxsem Synchronized php-1.24wmf9/extensions/VisualEditor/: (no message) (duration: 00m 04s)
23:14 bd808: Restarted logstash service on logstash1001
23:06 logmsgbot: maxsem Synchronized php-1.24wmf10/extensions/MultimediaViewer/: (no message) (duration: 00m 05s)
23:06 logmsgbot: maxsem Synchronized php-1.24wmf9/extensions/MultimediaViewer/: (no message) (duration: 00m 05s)
22:52 bd808: Updated scap to 792a572
21:21 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: Set wmgMediaViewerBeta to false everywhere (duration: 00m 15s)
21:16 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.24wmf10
21:07 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikipedias to 1.24wmf9 take 2
19:31 mutante: started mysql on pc1002
19:17 MatmaRex: <RobH> powercycled pc1002
19:15 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias back to 1.24wmf8
19:06 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikipedias to 1.24wmf9
19:00 logmsgbot: reedy Finished scap: scap 1.24wmf10 take 2... (duration: 22m 59s)
18:37 ori: neon, logstash100x, zirconium, stat1001, netmon1001: replaced sites-enabled symlinks with their targets and forced puppet-run to clean up after Iddc778a28
18:37 logmsgbot: reedy Started scap: scap 1.24wmf10 take 2...
18:08 logmsgbot: reedy Started scap: testwiki to 1.24wmf10 and build l10n cache
17:29 logmsgbot: hoo Synchronized php-1.24wmf9/extensions/Wikidata/: Update Wikidata to fix the entity selector (duration: 00m 09s)
15:51 mutante: powercycling elastic1017 (went down and no console output)
15:13 godog: removed old pmtpa swift stats from graphite
15:04 logmsgbot: anomie Synchronized wmf-config/InitialiseSettings.php: SWAT: Put testwiki namespaces in the right place gerrit:140261 (duration: 00m 14s)
15:04 logmsgbot: anomie Synchronized wmf-config/InitialiseSettings.php: SWAT: Put testwiki namespaces in the right place gerrit:140261 (duration: 00m 15s)
15:02 logmsgbot: anomie Synchronized wmf-config/throttle.php: SWAT: Raise account creation limit for Telugu Wikipedia workshop on June 23 gerrit:140669 (duration: 00m 15s)
14:30 cmjohnson1: replacing failed disk slot3 es1006
13:01 _joe_: re-enable puppet on lvs1003
11:26 logmsgbot: reedy Synchronized wmf-config/: touch (duration: 00m 15s)
11:25 logmsgbot: reedy Synchronized commonsuploads.dblist: (no message) (duration: 00m 15s)
11:00 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 14s)
10:53 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 15s)
10:52 logmsgbot: reedy Synchronized docroot and w: (no message) (duration: 00m 15s)
10:28 Reedy: manually ran sync-common tin on fenari
10:09 logmsgbot: reedy Synchronized docroot/noc: (no message) (duration: 00m 15s)
10:07 logmsgbot: reedy Synchronized docroot and w: (no message) (duration: 00m 14s)
10:04 logmsgbot: reedy Synchronized wmf-config/: I248fa7b98a8a0eea943c6643d1bf9c2ed36296b8 (duration: 00m 15s)
03:34 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Jun 19 03:33:36 UTC 2014 (duration 33m 35s)
02:46 logmsgbot: LocalisationUpdate completed (1.24wmf9) at 2014-06-19 02:45:51+00:00
02:24 logmsgbot: LocalisationUpdate completed (1.24wmf8) at 2014-06-19 02:23:42+00:00

June 18

23:09 awight: update crm from 26460d6eaec26861661322df8e9f07a8b0519677 to f3389daa94e9ad924175bdf0d5bc09c4a26aeb8c
23:05 logmsgbot: maxsem Synchronized php-1.24wmf9/extensions/VisualEditor/: https://gerrit.wikimedia.org/r/#/c/140563/ (duration: 00m 03s)
23:03 logmsgbot: maxsem Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/140250/ (duration: 00m 04s)
22:30 bblack: rebooting lvs1004 + lvs1005
22:10 bblack: turning lvs1003 pybal back on
21:52 bblack: disable pybal on lvs1003, since 1006 seems to have all its interfaces :P
21:34 bblack: rebooting lvs1003 for kernel/bios stuff
21:00 bblack: rebooting lvs1006 for kernel/bios stuff
20:23 subbu: deployed Parsoid 88a61f81 (deploy repo sha 470a5ef2)
17:39 logmsgbot: yurik Synchronized docroot/bits/WikipediaMobileFirefoxOS/: (no message) (duration: 01m 09s)
17:35 logmsgbot: yurik Synchronized php-1.24wmf9/extensions/: Updating JsonConfig, ZeroBanner, ZeroPortal (duration: 01m 14s)
17:32 logmsgbot: yurik Synchronized php-1.24wmf8/extensions/: Updating JsonConfig, ZeroBanner, ZeroPortal (duration: 01m 15s)
17:26 logmsgbot: yurik Synchronized docroot/bits/WikipediaMobileFirefoxOS/: (no message) (duration: 01m 04s)
17:10 RobH: magnesium back to proper function
17:09 RobH: apache2ctl restart on magnesium, racktables wasn't working
16:24 bblack: rebooting lvs4001 for kenerl + num_queues
16:19 bblack: rebooting lvs4002 for kenerl + num_queues
15:20 bblack: rebooting lvs4003 for kernel / num_queues updates
15:17 bblack: rebooting lvs4004 for kernel / num_queues updates
15:10 logmsgbot: anomie Synchronized php-1.24wmf9/extensions/Scribunto/engines/LuaCommon/SiteLibrary.php: SWAT: Fix Scribunto-related exceptions on testwiki gerrit:140370 (duration: 00m 14s)
13:40 _joe_: restarted profiler-to-carbon, stuck (again) waiting for mwprof
13:25 springle: script rt-7708.pl hitting m2-master eventlogging from terbium for RT #7708. fine to kill if necessary
10:01 hashar: Updated our Jenkins job builder fork: 8cbc93a..416ee7d
08:26 _joe_: disk is gone, powering down ms-be1007, opening ticket for disk replacement
08:24 _joe_: stopped swift on ms-be1007, unmounting volume to check for repair
06:01 springle: restarted gmetad on nickel while unbreaking the mysql graphs I broke on ganglia
04:30 ori: enabled puppet on polonium (was disabled but nothing in SAL)
02:59 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Jun 18 02:58:22 UTC 2014 (duration 58m 21s)
02:26 logmsgbot: LocalisationUpdate completed (1.24wmf9) at 2014-06-18 02:25:03+00:00
02:23 MaxSem: searchidx1001 outta sync - running sync-common
02:14 logmsgbot: LocalisationUpdate completed (1.24wmf8) at 2014-06-18 02:13:34+00:00
02:05 Krinkle: Nevermind, graphite.wikimedia.org going down is due to overload which recovers eventually (it just has). Has become SNAFU/FIXME.
02:02 Krinkle: graphite.wikimedia.org is down with HTTP 502 Bad Gateway errors
01:49 ori: puppet freshness on tungsten and stat1001 can be fixed with https://gerrit.wikimedia.org/r/#/c/140269/

June 17

20:19 logmsgbot: maxsem Synchronized php-1.24wmf9/extensions/MobileFrontend/: https://gerrit.wikimedia.org/r/#/c/140178/ (duration: 00m 04s)
20:17 logmsgbot: maxsem Synchronized php-1.24wmf8/extensions/MobileFrontend/: https://gerrit.wikimedia.org/r/#/c/140178/ (duration: 00m 05s)
20:01 logmsgbot: hoo Synchronized php-1.24wmf9/extensions/Wikidata/: Update Wikidata to fix editing site links (duration: 00m 24s)
18:23 logmsgbot: reedy Synchronized docroot and w: (no message) (duration: 00m 16s)
18:22 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non Wikipedias to 1.24wmf9
18:05 logmsgbot: demon Synchronized wmf-config/PoolCounterSettings-eqiad.php: Limit regex searches before they start landing on wikis (duration: 00m 04s)
16:32 bblack: enabled amssq31-46 esams text frontend varnishes in pybal (were misconfigured; wrong domainname)
15:18 logmsgbot: manybubbles Synchronized php-1.24wmf8/extensions/CirrusSearch/: SWAT - Fix Cirrus Special:Random (duration: 00m 04s)
15:13 logmsgbot: manybubbles Synchronized php-1.24wmf9/extensions/CirrusSearch/: SWAT - Fix Cirrus Special:Random (duration: 00m 04s)
15:02 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: SWAT - lower event logging rate for mediaviewer (duration: 00m 05s)
13:51 _joe_: production puppet masters upgraded to puppet 3
07:12 springle: starting updateCollation on s3 frwikinews from tin
07:07 logmsgbot: springle Synchronized wmf-config/InitialiseSettings.php: $wgCategoryCollation to uca-fr on frwikinews (duration: 00m 07s)
03:20 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Jun 17 03:19:12 UTC 2014 (duration 19m 11s)
02:35 logmsgbot: LocalisationUpdate completed (1.24wmf9) at 2014-06-17 02:34:09+00:00
02:23 logmsgbot: LocalisationUpdate completed (1.24wmf8) at 2014-06-17 02:22:46+00:00

June 16

23:12 logmsgbot: maxsem Synchronized php-1.24wmf8/extensions/MobileFrontend/: https://gerrit.wikimedia.org/r/#/c/139562/ (duration: 00m 05s)
23:11 logmsgbot: maxsem Synchronized php-1.24wmf9/extensions/MobileFrontend/: https://gerrit.wikimedia.org/r/#/c/139562/ (duration: 00m 06s)
23:05 logmsgbot: maxsem Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/139888/ (duration: 00m 08s)
21:30 ori: upgraded eventlogging to 3012aad
20:45 ori: updated eventlogging to b4b42effc6
17:36 logmsgbot: csteipp Synchronized php-1.24wmf8/extensions/EducationProgram/includes/api/ApiAddStudents.php: Bug66631 (duration: 00m 05s)
17:34 logmsgbot: csteipp Synchronized php-1.24wmf9/extensions/EducationProgram/includes/api/ApiAddStudents.php: (no message) (duration: 00m 05s)
15:59 godog: manually ran update-ubuntu-mirror on carbon, successful
15:57 awight: updated crm from e52a4eb1bfab622f612dc84f687678fff1fdbc04 to 26460d6eaec26861661322df8e9f07a8b0519677
15:30 ottomata: reinstalling analytics1018
13:38 twkozlowski: _joe_ also working on recovering the list which was deleted by mistake
13:37 _joe_: closed wikimedia-de-by list
13:13 _joe_: removing chip-l mailing list as for bug #63877
13:03 godog: restarting swift-proxy-server on ms-fe1001 to test statsd metrics
10:47 godog: restarting swift-proxy-server on ms-fe3002 to test statsd metrics
10:23 hoo: Touched all 1.24wmf8 extension/wikidata files and ran sync-common after that on mw1070
10:18 logmsgbot: hoo Synchronized php-1.24wmf8/extensions/Wikidata/: Update Wikidata to fix a suggester bug (duration: 00m 09s)
10:16 godog: restarting swift-proxy-server on ms-fe3001 to test statsd metrics
10:12 logmsgbot: hoo Synchronized php-1.24wmf9/extensions/Wikidata/: Update Wikidata to fix a suggester bug (duration: 00m 13s)
09:29 apergos: restarted search1015 about 15 mns ago, it's now recovered afaict, restarted search1016, it's doing index setup now
03:00 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Jun 16 02:59:43 UTC 2014 (duration 59m 42s)
02:27 logmsgbot: LocalisationUpdate completed (1.24wmf9) at 2014-06-16 02:26:05+00:00
02:15 logmsgbot: LocalisationUpdate completed (1.24wmf8) at 2014-06-16 02:14:38+00:00

June 15

17:44 logmsgbot: hoo Synchronized php-1.24wmf8/extensions/Wikidata/: Touched various JavaScripts (duration: 00m 09s)
14:26 Reedy: Job runners were restarted on tmh100[12] and are now processing jobs
14:15 godog: extended palladium root partition by +20G
13:50 _joe|away: restarted mw-job-runner on tmh1001
10:02 paravoid: nuked ms-be1001 sdj with zeros, reformatting and placing into production again
02:59 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Jun 15 02:58:21 UTC 2014 (duration 58m 20s)
02:27 logmsgbot: LocalisationUpdate completed (1.24wmf9) at 2014-06-15 02:26:03+00:00
02:15 logmsgbot: LocalisationUpdate completed (1.24wmf8) at 2014-06-15 02:14:46+00:00

June 14

22:27 bawolff: video scalers seem to have stopped doing webVideoTranscode jobs
20:24 legoktm: ran "delete from ep_students where student_user_id =0 limit 1;" on enwiki for bug 66624
20:10 legoktm: ran "delete from ep_users_per_course where upc_user_id=0 limit 1" on enwiki for bug 66624
19:19 paravoid: unmounting ms-be1001's sdj1, corrupted filesystem
18:46 paravoid: rebooting ms-be1001, XFS: Internal error XFS_WANT_CORRUPTED_RETURN, lots of processes in D
03:08 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Jun 14 03:07:14 UTC 2014 (duration 7m 13s)
02:37 logmsgbot: LocalisationUpdate completed (1.24wmf9) at 2014-06-14 02:36:35+00:00
02:36 bblack: enabled amssq43-46 frontends (esams text varnish) in pybal
02:17 logmsgbot: LocalisationUpdate completed (1.24wmf8) at 2014-06-14 02:16:38+00:00
00:46 bblack: enabled amssq39-42 frontends (esams text varnish) in pybal

June 13

22:01 manybubbles: logstash1002 seems to be properly restoring nodes to itself. I'll monitor it for the next few minutes but I believe my work here is done.
21:55 manybubbles: bouncing logstash1002 because it seems stuck. not sure why. no useful logs.
21:07 bblack: turned on amssq35-38 text frontends in esams (in pybal)
20:57 awight: update crm from c38296add61421f87e12cb5b4f3dd68bdf2340db to e52a4eb1bfab622f612dc84f687678fff1fdbc04
20:23 bblack: turned on amssq31-34 text frontends in esams
18:41 mutante: DNS update - removing manutius' public IP
18:31 mutante: shutting down manutius, decom
18:22 logmsgbot: ori Synchronized php-1.24wmf9/extensions/Math: I498053de4: Fix the VisualEditor parts of Math-wmf9 with a working cherry pick of I7d5e1174 (duration: 00m 08s)
16:55 logmsgbot: hoo Synchronized php-1.24wmf8/extensions/Wikidata/: Update Wikidata to fix JavaScript issues (duration: 00m 09s)
16:45 logmsgbot: hoo Synchronized php-1.24wmf9/extensions/Wikidata/: Update Wikidata to fix JavaScript issues (duration: 00m 10s)
16:31 Reedy: Finished creating mathoid tables on all wikis
16:26 Reedy: Creating mathoid tables on all wikis
16:11 mutante: manutius - decom, delete salt key, puppet cert, stopped services...
15:17 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Jun 13 15:16:09 UTC 2014 (duration 53m 11s)
14:59 logmsgbot: reedy Synchronized wmf-config/: Disable MW_MATH_SOURCE for now (duration: 00m 15s)
14:46 logmsgbot: LocalisationUpdate completed (1.24wmf9) at 2014-06-13 14:45:40+00:00
14:36 logmsgbot: LocalisationUpdate completed (1.24wmf8) at 2014-06-13 14:35:41+00:00
13:00 bblack: moved ge-3/0/0 - 3/0/15 from public to private vlan on cs2-esams (amssq31-46)
10:02 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1049 (duration: 00m 12s)
09:56 paravoid: deactivating eqiad<->HE, excessive packet loss/latency
09:33 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1071 (duration: 00m 07s)
08:10 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1070, depool db1071 (duration: 00m 12s)
07:48 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1066, depool db1070 (duration: 00m 07s)
07:19 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1065, depool db1066 (duration: 00m 13s)
06:51 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1062, depool db1065 (duration: 00m 09s)
06:09 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1062 (duration: 00m 12s)
05:58 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1051 (duration: 00m 14s)
03:54 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Jun 13 03:53:17 UTC 2014 (duration 53m 16s)
03:12 logmsgbot: LocalisationUpdate completed (1.24wmf9) at 2014-06-13 03:11:28+00:00
02:35 logmsgbot: LocalisationUpdate completed (1.24wmf8) at 2014-06-13 02:34:41+00:00
00:45 logmsgbot: ori Synchronized php-1.24wmf8/extensions/Math: Reverting Extension:Math to 1bb3bfa3b5656 (duration: 00m 05s)
00:44 logmsgbot: ori Synchronized php-1.24wmf9/extensions/Math: Reverting Extension:Math to 1bb3bfa3b5656 (duration: 00m 06s)
00:41 ori: removed Physikerwelt and Frédéric Wang from extension-Math group in Gerrit pending further inquiry into recent changes
00:38 logmsgbot: ori Finished scap: fix any lingering inconsistencies in the state of the app servers (see https://gerrit.wikimedia.org/r/139089) (duration: 26m 59s)
00:11 logmsgbot: ori Started scap: fix any lingering inconsistencies in the state of the app servers (see https://gerrit.wikimedia.org/r/139089)

June 12

23:35 logmsgbot: ori Synchronized php-1.24wmf8/extensions/MobileFrontend: Re-syncing after submodule update (duration: 00m 06s)
23:34 ori: ran sync-common on mw1151
23:17 logmsgbot: catrope Synchronized php-1.24wmf9/extensions/MobileFrontend: (no message) (duration: 00m 04s)
23:17 logmsgbot: catrope Synchronized php-1.24wmf8/extensions/MobileFrontend: (no message) (duration: 00m 05s)
23:17 logmsgbot: catrope Synchronized php-1.24wmf8/extensions/VisualEditor: (no message) (duration: 00m 04s)
23:07 Krinkle: integration-slave1003 is failing npm-test builds due to a cache corruption (filed as https://github.com/npm/npm/issues/5472). Manually cleared /mnt/home/jenkins-deploy/.npm/async on integration-slave1003.eqiad.wmflabs for now.
23:05 MaxSem: Purging PageImages data from Wikibooks and Wikisource
22:59 logmsgbot: catrope Synchronized wmf-config/: (no message) (duration: 00m 04s)
22:46 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: disable MW_MATH_MATHML until mathoid table is created (BUG 66492) (duration: 00m 04s)
22:31 logmsgbot: ori Synchronized php-1.24wmf8/extensions/WikimediaEvents: Update WikimediaEvents for Ibd36da416 (duration: 00m 03s)
22:30 logmsgbot: ori Synchronized php-1.24wmf9/extensions/WikimediaEvents: Update WikimediaEvents for Ibd36da416 (duration: 00m 03s)
21:11 logmsgbot: yurik Synchronized php-1.24wmf9/extensions/JsonConfig/: JsonConfig ext update, fixing bug 66555 (duration: 01m 03s)
21:10 logmsgbot: yurik Synchronized php-1.24wmf8/extensions/JsonConfig/: JsonConfig ext update, fixing bug 66555 (duration: 01m 04s)
19:25 ottomata: stopping puppet on an18
19:19 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.24wmf9
19:19 ottomata: starting hadoop decom of analytics1018. This node will become a Kafka broker
19:04 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias to 1.24wmf8
19:04 MaxSem: Dropping old GeoData tables from everywhere
18:52 logmsgbot: reedy Finished scap: 1.24wmf9 staging take 2... (duration: 15m 20s)
18:37 logmsgbot: reedy Started scap: 1.24wmf9 staging take 2...
18:06 logmsgbot: reedy Started scap: testwiki to 1.24wmf9 and build l10n cache
17:49 ottomata: disabling puppet on analytics1012 and analytics1022
17:48 ottomata: starting some kafka failure tests, I have scheduled downtime for some service checks in icinga, hopefully this will not be noisy
17:41 ottomata: restarting elasticsearch on logstash servers
17:34 logmsgbot: yurik Synchronized wmf-config/InitialiseSettings.php: Enabling new zero ext on all wikis (duration: 01m 03s)
17:22 logmsgbot: yurik Synchronized wmf-config/InitialiseSettings.php: Attempting to enable new zero ext on zerowiki & ruwiki - take3 (duration: 01m 04s)
17:06 logmsgbot: yurik Synchronized php-1.24wmf8/extensions/: (no message) (duration: 01m 12s)
17:05 greg-g: yurik's blank sync message could have been: Deploying new JsonConfig,ZeroBanner,ZeroPortal extensions (refactoring ZeroRatedMobileAccess ext)
17:04 logmsgbot: yurik Synchronized php-1.24wmf7/extensions/: (no message) (duration: 01m 15s)
15:31 logmsgbot: manybubbles Synchronized wmf-config/throttle.php: SWAT: Raise account creation limit for eswiki outreach event (duration: 00m 05s)
13:39 bblack: enabling cp301[34] esams mobile frontends in pybal
11:18 hashar: Gerrit: created mediawiki/services/cxserver/deploy repository for Nikerabbit and kart_
05:52 paravoid: cr1-esams/cr2-knams: dismantling amslvs BGP peerings
05:46 paravoid: amslvs[1234]: stopping pybal
03:40 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Jun 12 03:39:07 UTC 2014 (duration 39m 6s)
03:03 logmsgbot: LocalisationUpdate completed (1.24wmf8) at 2014-06-12 03:02:09+00:00
02:58 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1051 (duration: 01m 08s)
02:33 logmsgbot: LocalisationUpdate completed (1.24wmf7) at 2014-06-12 02:32:07+00:00
01:47 ori: graceful'd appservers for I0e66ee0a1: 2.4 compat: load mod_filter for AddOutputFilterByType
00:44 bblack: ran "puppetca -s palladium.eqiad.wmnet" on palladium to get agent running again, someone borked/regenerated the key there 6 hours ago?
00:20 mwalker: clearMessageBlobs.php killed because we fixed the problem in a more different way
00:17 logmsgbot: mwalker Synchronized php-1.24wmf8/extensions/MultimediaViewer/resources/mmv/ui/mmv.ui.canvasButtons.js: poking cache for multimediaviewer messages (duration: 00m 04s)
00:05 logmsgbot: aaron Synchronized php-1.24wmf8/includes/EditPage.php: e11d41dd366b039bff79e247368b6bff1245ea5e (duration: 00m 07s)

June 11

23:50 mwalker: clearing resourceloader blobs on commonswiki to try and force a multimediaviewer message "mwscript extensions/WikimediaMaintenance/clearMessageBlobs.php --wiki=commonswiki"
23:49 awight: updated SmashPig from 98b1f348aa55f6a3aac441db08a59ca309fade7a to 22e2923a3a030b17815181574f9ca99b38c5f2dc
23:41 logmsgbot: mwalker Finished scap: SWAT deploy for MultimediaViewer, CentralNotice, and testwiki config (duration: 24m 16s)
23:16 logmsgbot: mwalker Started scap: SWAT deploy for MultimediaViewer, CentralNotice, and testwiki config
23:10 Krinkle: Running deleteEqualMessages.php on trwiki (bug 43917)
22:58 logmsgbot: yurik Synchronized wmf-config/: Restoring to ZRMA for now (duration: 01m 04s)
22:22 logmsgbot: yurik Synchronized wmf-config/InitialiseSettings.php: Attempting to enable new zero ext on zerowiki & ruwiki - take2 (duration: 01m 06s)
22:19 ^d: restarted elasticsearch on logstash1003, complaining about heap.
22:06 logmsgbot: yurik Synchronized wmf-config/InitialiseSettings.php: Attempting to enable new zero ext on zerowiki & ruwiki (duration: 01m 12s)
21:58 logmsgbot: yurik Synchronized php-1.24wmf8/extensions/JsonConfig/: (no message) (duration: 01m 11s)
21:56 logmsgbot: yurik Synchronized php-1.24wmf7/extensions/JsonConfig/: (no message) (duration: 01m 09s)
21:50 logmsgbot: yurik Finished scap: (no message) (duration: 25m 51s)
21:46 ori: Disabling Puppet on mw1149. It's a former bits app server that isn't in PyBal so it isn't getting traffic. Going to stage some proposed changes for apache-config and operations/puppet there.
21:24 logmsgbot: yurik Started scap: (no message)
21:05 logmsgbot: yurik Finished scap: Deploying 3 new ext (JsonConfig, ZeroBanner, ZeroPortal), but they are not enabled anywhere yet (duration: 05m 03s)
21:00 logmsgbot: yurik Started scap: Deploying 3 new ext (JsonConfig, ZeroBanner, ZeroPortal), but they are not enabled anywhere yet
20:07 gwicke: deployed Parsoid 3de0dba15
19:18 bblack: rebooting lvs3003 for 3.13 kernel
19:17 logmsgbot: marktraceur Finished scap: MultimediaViewer fixes for cards 630, 429, and 697 (duration: 18m 45s)
19:17 greg-g: mw1151 *still* giving permission denied errors (publickey), what's the status, yo?
19:03 bblack: rebooting lvs3002 for 3.13 kernel + XPS
18:59 logmsgbot: marktraceur Started scap: MultimediaViewer fixes for cards 630, 429, and 697
18:44 ottomata: disabling puppet on analytics1012 to allow for more replica threads to catch up with current broker replicas...maybe :)
18:41 awight: updated crm from b6815d29de97b80a0ab65db576213a604f0c7cb9 to c38296add61421f87e12cb5b4f3dd68bdf2340db
18:03 Krinkle: Reloading Zuul to deploy I5d154a4002d08
16:43 bblack: shutting off lvs3002.esams pybal to test XPS balancing of live traffic on lvs3004.esams + 3.13
16:30 bblack: rebooting lvs3004 (inactive uploads LVS) for 3.13 again
14:52 hashar: Jenkins restarting (plugin upgrades)
14:48 bblack: rebooting lvs3004.esams (inactive uploads LVS) for 3.13 kernel
14:41 _joe_: manually ran 'planet' on en.planet to restore technews
14:40 hashar: Jenkins updating plugins
13:56 paravoid: upgrading mw1153-mw1160, tmh1001-tmh1002 for USN-2244-1
12:21 _joe_: set up a secondary remote named 'readonly' in /a/common on tin, to use with the icinga check for unmerged commits
11:40 akosiaris: manually cleaning librenms tables. db1001 is going to have increased load for some time. The approach is automatable, see http://jira.observium.org/browse/OBSERVIUM-757
11:32 godog: restarted uwsgi on tungsten, a lot of accesses to reqstats.edits.*.submits
10:45 godog: restarted uwsgi on tungsten, hung on fetching many metrics
09:54 _joe_: restarted apache on palladium - passenger crashed
05:26 paravoid: restarting all swift daemons across the cluster to fix runaway threads due to rsyslog restart
05:04 springle: beginning schema changes bug 49193 page_content_model
03:29 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Jun 11 03:28:14 UTC 2014 (duration 28m 13s)
02:29 logmsgbot: LocalisationUpdate completed (1.24wmf8) at 2014-06-11 02:28:18+00:00
02:15 logmsgbot: LocalisationUpdate completed (1.24wmf7) at 2014-06-11 02:14:43+00:00

June 10

23:34 andrewbogott: updated labs Trusty image w/puppet3, made default
23:19 mutante: rebooting unresponsive ms-be1003
21:09 RobH: montly sms credit check: 1,447.36 SMS credits. will check again in 30 days
19:47 hashar: Jenkins restarted apparently properly. Any breakage would probably be related to the version switch :-D
19:45 ottomata: power cycling analytics1012, attempting to reinstall as kafka broker with new kafka partman recipe
19:42 hashar: Jenkins upgraded from 1.532.2 to 1.554.2 (i.e. bumped to a new LTS version).
19:37 hashar: Broke Jenkins by silently upgrading it :-(
19:09 Krinkle: git-deploy: Deploying integration/slave-scripts I9521890b911714edf2
18:59 logmsgbot: reedy Synchronized php-1.24wmf8/skins/vector/components/tabs.less: (no message) (duration: 00m 14s)
18:58 mutante: shutting down ekrem
18:18 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: Enable data transclusion for wikiquote (duration: 00m 14s)
18:15 logmsgbot: reedy Synchronized docroot and w: Update non Wikipedias to 1.24wmf8 (duration: 00m 16s)
18:15 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Update non Wikipedias to 1.24wmf8
18:14 logmsgbot: reedy Synchronized php-1.24wmf8/extensions/Wikidata/: (no message) (duration: 00m 16s)
17:28 _joe|away: restarted profiler-to-carbon, stuck waiting data from mwprof
15:21 mutante: ekrem - rm from stored configs/icinga
15:12 mutante: ekrem - revoke salt,puppet keys, stop agents/minion
07:42 springle: enabled pt-slave-delay for dbstore1001, 24h all shards
06:12 springle: xtrabackup clone db1043 to db1048
04:57 springle: db1048 down for upgrade
03:40 springle: switched mchenry to use m2-master/m2-slave for OTRS address lookups
03:25 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Jun 10 03:24:19 UTC 2014 (duration 24m 18s)
02:29 logmsgbot: LocalisationUpdate completed (1.24wmf8) at 2014-06-10 02:28:14+00:00
02:27 springle: switched traffic db1048 to db1020. broke gerrit briefly; see ops email
02:15 logmsgbot: LocalisationUpdate completed (1.24wmf7) at 2014-06-10 02:14:41+00:00
01:33 chasemp: restarted gerrit on ytterbium
01:01 manybubbles: upgraded all elasticsearch servers in production to 1.2.1. They are just restoring the last few shards on the last node now and they'll spend a few hours tonight rebalancing after the upgrade but otherwise I'm done.
00:41 mwalker: updating donationinterface on payments from b4c5cf1bceb70d65eae28cdd0873036dc33c8992 to 6d74002f2634f41f7038daa7357ff6de55ee4880 for worldpay form error

June 9

23:58 manybubbles: lied - upgrading elastic1014
23:57 manybubbles: upgrading elastic1015
23:30 Krinkle: Reloading Zuul to deploy 6727b8b
23:12 logmsgbot: maxsem Synchronized php-1.24wmf8/extensions/MobileApp: (no message) (duration: 00m 03s)
23:11 logmsgbot: maxsem Synchronized php-1.24wmf7/extensions/MobileApp: (no message) (duration: 00m 03s)
23:03 logmsgbot: maxsem Synchronized wmf-config/InitialiseSettings.php: https://bugzilla.wikimedia.org/66377 (duration: 00m 04s)
20:42 manybubbles: upgraded elastic1007-elastic1010 without issue - starting elastic1010
20:08 subbu: deployed Parsoid 9b673587 (deploy sha 7d0097a1)
19:23 ottomata: disabling puppet on analytics1012
18:59 ottomata: decomissioning analytics1012 in hadoop cluster, this will become a Kafka broker
17:58 manybubbles: elastic1004-1006 upgraded without trouble - cluster is working on filling elatic1006 before moving on to 1007, and the rest
17:04 andrewbogott: switching labs to puppet3
17:03 awight: update crm from b38497a9d0ef75fe2b20b03b649ac13a5e3f47a7 to b6815d29de97b80a0ab65db576213a604f0c7cb9
16:30 manybubbles: upgrading elastic1003 - upgrade is going well so far so I'm going to stop watching it as closely and let it be more automated
15:28 manybubbles: elastic1001 went well, doing 1002 by hand again
15:17 logmsgbot: anomie Synchronized php-1.24wmf8/extensions/Wikidata: SWAT: Wikidata entity suggester bug fixes gerrit:138339 (duration: 00m 16s)
15:12 greg-g: mw1151 still "permission denied" during deploys
15:12 logmsgbot: anomie Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable TemplateData GUI on Portuguese Wikipedia gerrit:137986 (duration: 00m 14s)
15:09 logmsgbot: anomie Synchronized php-1.24wmf7/extensions/VisualEditor/modules/ve-mw/ui/dialogs/ve.ui.MWSaveDialog.js: SWAT: VE fix for focus regression gerrit:137978 (duration: 00m 15s)
15:06 andrewbogott: beta updating all instances to puppet 3 via a cherry-pick of https://gerrit.wikimedia.org/r/#/c/137898/ on deployment-salt
15:05 logmsgbot: anomie Synchronized php-1.24wmf8/extensions/VisualEditor/modules/ve-mw/: SWAT: VE fix for focus regression and alignment issues gerrit:137971 gerrit:138122 (duration: 00m 14s)
15:01 manybubbles: successfully synced plugins, upgrading elastic1001 to make sure everything is working ok with it - then we'll run through the others more quickly
14:57 manybubbles: syncing elasticsearch plugins for 1.2.1 - any elasticsearch restart from here on out needs to come with 1.2.1 or the node will break.
14:54 manybubbles: starting Elasticsearch upgrade with elastic1001
07:14 springle: disabled puppet on analytics1021 to avoid kafka broker restarting with missing mount
05:15 springle: xtrabackup clone db1046 to db1020
04:44 springle: umount /dev/sdf on analytics1021, fs in r/o mode, kafka broker not running. no checks yet
03:24 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Jun 9 03:23:05 UTC 2014 (duration 23m 4s)
02:29 logmsgbot: LocalisationUpdate completed (1.24wmf8) at 2014-06-09 02:28:08+00:00
02:15 logmsgbot: LocalisationUpdate completed (1.24wmf7) at 2014-06-09 02:14:46+00:00

June 8

23:27 p858snake|l: icinga has been shitting in the channel for 9+ hours (before I went to bed) about Varnishkafka, nothing noted in SAL. Here be a note about it.
03:22 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Jun 8 03:21:28 UTC 2014 (duration 21m 27s)
02:28 logmsgbot: LocalisationUpdate completed (1.24wmf8) at 2014-06-08 02:27:21+00:00
02:15 logmsgbot: LocalisationUpdate completed (1.24wmf7) at 2014-06-08 02:14:10+00:00

June 7

23:48 hoo: Fixed four CentralAuth log entries on meta which were logged for WikiSets/0
21:36 manybubbles: that means I turned off puppet and shut down Elasticsearch on elastic1017 - you can expect the cluster to go yellow for half an hour or so while the other nodes take rebuild the redundency that elastic1017 had
21:35 manybubbles: after consulting logs - elastic1017 has had high io wait since it was deployed - I'm taking it out of rotation
21:31 manybubbles: elastic1017 is sick - thrashing to death on io - restarting Elasticsearch to see if it recovers unthrashed
17:56 godog: restarted ES on elastic1017.eqiad.wmnet (at 17:22 UTC)
03:24 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Jun 7 03:23:32 UTC 2014 (duration 23m 31s)
02:31 logmsgbot: LocalisationUpdate completed (1.24wmf8) at 2014-06-07 02:29:57+00:00
02:17 logmsgbot: LocalisationUpdate completed (1.24wmf7) at 2014-06-07 02:16:30+00:00

June 6

23:51 Krinkle: Restarted Jenkins, force stopped Zuul, started Zuul, configure Jenkins via web interface (disable Gearman, save, enable German); Seems to be back up now, finally.
22:52 mutante: same for rhenium, titanium, bast1001, calcium, carbon, ytterbium, stat1003
22:42 RoanKattouw: Restarting Jenkins didn't help, jobs still aren't making it across from Zuul into Jenkins
22:36 RoanKattouw: Restarting stuck Jenkins
22:35 mutante: same for holmium, hafnium, silver, netmon1001, magnesium, neon, antimony
22:17 mutante: upgraded ssl packages on zirconium
21:57 Krinkle: Took Jenkins slave on gallium temporarily offline and back online to resolve possible stagnation
20:56 awight_: updated crm from ded541894a70922e098fb3ea48306c8ec0f0f6aa to b38497a9d0ef75fe2b20b03b649ac13a5e3f47a7
18:24 mwalker: updating payments from e823354822c7a35e6c2069d3e72180a45dbc89dc to b4c5cf1bceb70d65eae28cdd0873036dc33c8992 for globalcollect oid hack
14:04 hashar: Gerrit back. chase rebooted it :)
13:55 hashar: Gerrit having some troubles: error: RPC failed; result=22, HTTP code = 503 (while cloning CirrusSearch )
12:58 cmjohnson1: replacing raid controller db1020
06:12 Tim: on osmium installed nodejs for testing
04:24 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Jun 6 04:23:08 UTC 2014 (duration 23m 7s)
03:13 logmsgbot: LocalisationUpdate completed (1.24wmf8) at 2014-06-06 03:12:19+00:00
02:43 logmsgbot: LocalisationUpdate completed (1.24wmf7) at 2014-06-06 02:42:28+00:00
00:38 bblack: nginx restarted on ssl*
00:16 mutante: fixed permissions on bugzilla's index.cgi, sry

June 5

23:18 logmsgbot: maxsem Synchronized php-1.24wmf7/includes/ChangeTags.php: https://gerrit.wikimedia.org/r/#/c/137563/ (duration: 00m 03s)
23:16 logmsgbot: maxsem Synchronized php-1.24wmf8/includes/ChangeTags.php: https://gerrit.wikimedia.org/r/#/c/137563/ (duration: 00m 03s)
23:06 logmsgbot: maxsem Synchronized php-1.24wmf7/extensions/TemplateData: https://gerrit.wikimedia.org/r/#/c/137751/ (duration: 00m 04s)
22:15 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: Ife5081549: Put $wgRCFeeds[rcs100x] config behind $wmfRealm check (duration: 00m 04s)
22:12 logmsgbot: ori updated /a/common to Ife5081549: Put $wgRCFeeds['rcs100x'] config behind $wmfRealm check
21:48 ori: updated eventlogging to a8602c1d879f
21:34 MaxSem: Renaming geo_killlist and geo_updates to *_old
18:36 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 14s)
18:35 logmsgbot: reedy Synchronized database lists: (no message) (duration: 00m 13s)
18:17 Reedy: Created FlaggedRevs tables on ckbwiki
18:11 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Update group0 to 1.24wmf8
18:06 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikipedias to 1.24wmf7
17:00 logmsgbot: reedy Synchronized wmf-config/: Wrap some long lines, add some docs (duration: 00m 26s)
16:43 bblack: rebooting lvs3002
16:36 paravoid: downpref all of amslvs* in favor of lvs30*
16:17 paravoid: downprefing amslvs1, upprefing lvs3001
16:02 mark: Connected cp3018:eth1 to cr1-esams:xe-0/0/3 (unconfigured)
15:59 _joe_: disabling puppet on virt1000 while we test the puppet3 upgrade on virt0
15:48 logmsgbot: reedy Finished scap: 2nd scap for 1.24wmf8, should be effectively a nooop (duration: 12m 33s)
15:35 logmsgbot: reedy Started scap: 2nd scap for 1.24wmf8, should be effectively a nooop
15:21 logmsgbot: anomie Synchronized php-1.24wmf6/extensions/VisualEditor/modules/ve-mw/ui/dialogs/: SWAT: Use <visualeditor-toolbar-cite-label> correctly in the Media and Reference toolbars gerrit:136783 (duration: 00m 15s)
15:18 logmsgbot: anomie Synchronized php-1.24wmf7/extensions/VisualEditor/modules/ve-mw/ui/dialogs/: SWAT: Use <visualeditor-toolbar-cite-label> correctly in the Media and Reference toolbars gerrit:136782 (duration: 00m 12s)
15:04 logmsgbot: anomie Synchronized php-1.24wmf7/extensions/Popups/resources/: SWAT: Hovercard animation fixes gerrit:137530 gerrit:137531 gerrit:137532 (duration: 00m 14s)
14:57 logmsgbot: reedy Finished scap: testwiki to 1.24wmf8 and build l10n cache (duration: 26m 23s)
14:54 hashar: restarting Zuul
14:31 logmsgbot: reedy Started scap: testwiki to 1.24wmf8 and build l10n cache
14:15 logmsgbot: reedy scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki="testwiki" --list-file="/a/common/wmf-config/extension-list" --output="/tmp/tmp.hiiCprts7Z" ' returned non-zero exit status 1 (duration: 00m 17s)
14:14 logmsgbot: reedy Started scap: testwiki to 1.24wmf8 and build l10n cache
14:07 logmsgbot: reedy scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki="testwiki" --list-file="/a/common/wmf-config/extension-list" --output="/tmp/tmp.WtQBrR6JUp" ' returned non-zero exit status 1 (duration: 01m 08s)
14:06 logmsgbot: reedy Started scap: testwiki to 1.24wmf8 and build l10n cahce
14:05 logmsgbot: reedy Purged l10n cache for 1.24wmf5
13:58 hashar: Adding unit tests Jenkins job for most mediawiki extensions 137578
12:05 godog: powercycling ms-be1005, no ssh, no console
10:28 godog: restarted uwsgi on tungsten
09:24 godog: moving bits traffic to the general appserver pool in eqiad
04:10 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Jun 5 04:09:50 UTC 2014 (duration 9m 49s)
03:03 logmsgbot: LocalisationUpdate completed (1.24wmf7) at 2014-06-05 03:02:00+00:00
02:33 logmsgbot: LocalisationUpdate completed (1.24wmf6) at 2014-06-05 02:32:06+00:00
02:23 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: depool db1007 (duration: 01m 26s)
00:46 bblack: lvs3002 (live uploads lb for esams) is running ntpd

June 4

23:43 Tim: on searchidx1001: restarting lsearchd and indexer
23:40 logmsgbot: mwalker Finished scap: Scapping for SWAT; MultiMedia viewer and config changes (duration: 22m 16s)
23:20 Tim: on searchidx1001: as a temporary hack to work around scap disk full errors, set up a bind mount at /usr/local/apache/common-local linking to a directory in /a, by local modification of /etc/fstab
23:18 logmsgbot: mwalker Started scap: Scapping for SWAT; MultiMedia viewer and config changes
21:56 logmsgbot: yurik Synchronized wmf-config/: https://gerrit.wikimedia.org/r/#/c/136503/ (duration: 01m 07s)
21:54 logmsgbot: yurik Synchronized mobilelanding.php: (no message) (duration: 01m 07s)
20:47 MaxSem: Truncating geo_killlist everywhere
20:33 subbu: deployed Parsoid 165a2042 (deploy sha fc1b1ed4)
19:04 bd808|deploy: Restarted elasticsearch on logstash1001; JVM OOM
19:00 logmsgbot: maxsem Synchronized php-1.24wmf7/extensions/GeoData/: (no message) (duration: 00m 04s)
18:58 logmsgbot: maxsem Synchronized php-1.24wmf6/extensions/GeoData/: (no message) (duration: 00m 03s)
18:43 bd808|deploy: mw1151 gave an ssh denied error for MaxSem during sync-dir
18:40 logmsgbot: maxsem Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/136487/ (duration: 00m 04s)
17:54 mutante: shutting down solr1001-1003
17:47 logmsgbot: yurik Synchronized php-1.24wmf7/extensions/ZeroRatedMobileAccess/: (no message) (duration: 01m 07s)
17:44 logmsgbot: yurik Synchronized php-1.24wmf6/extensions/ZeroRatedMobileAccess/: (no message) (duration: 01m 06s)
17:27 mutante: stopping puppet/salt on solr100[13], removed from icinga
16:36 robh: blog.wikimedia.org updated to latest wp version
16:13 mutante: installing package upgrades on bast1001
16:11 mutante: installing package upgrades on iron
15:59 mutante: killing puppet certs,salt keys for solr100[13].eqiad - decom
15:28 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: close wikimania2013wiki for real (duration: 00m 10s)
15:28 logmsgbot: manybubbles Synchronized closed.dblist: close wikimania2013wiki (duration: 00m 09s)
15:23 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: close wikimania2013wiki (duration: 00m 10s)
15:21 logmsgbot: manybubbles Synchronized php-1.24wmf6/extensions/MobileApp/: (no message) (duration: 00m 10s)
15:15 logmsgbot: manybubbles Synchronized php-1.24wmf7/extensions/MobileApp/: (no message) (duration: 00m 08s)
15:07 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: SWAT deploy for media viewer (duration: 00m 13s)
14:57 mutante: cleaning up duplicate cronjobs on terbium - all log to /var/log/mediawiki now
12:53 hashar: Zuul upgraded (git tag wmf-deploy-20140604 ). Merges are now done by an indecent process zuul-merger
12:43 hashar: upgrading Zuul to split the merger part to an independent process. Short unscheduled downtime starting in a few minutes
07:51 _joe_: rebooted ms-be1001, host unresponsive to ping, blank console
06:14 springle: starting online schema change, bug 66089 gerrit 137149
04:27 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Jun 4 04:26:32 UTC 2014 (duration 26m 31s)
03:35 Krinkle: Deploy I882e3fa57b2e5e3de in Zuul and reload config
03:16 logmsgbot: LocalisationUpdate completed (1.24wmf7) at 2014-06-04 03:15:34+00:00
02:47 logmsgbot: LocalisationUpdate completed (1.24wmf6) at 2014-06-04 02:46:06+00:00

June 3

23:14 logmsgbot: ori Synchronized php-1.24wmf7/extensions/MobileApp: SWAT cherry-picks for MobileApp (with patch) (duration: 00m 04s)
23:11 logmsgbot: ori Synchronized php-1.24wmf6/extensions/MobileApp: SWAT cherry-picks for MobileApp (duration: 00m 04s)
23:10 logmsgbot: ori Synchronized php-1.24wmf7/extensions/MobileApp: SWAT cherry-picks for MobileApp (duration: 00m 03s)
23:06 logmsgbot: ori Synchronized wmf-config/InitialiseSettings.php: I9dac0dc6a80: Set $wgIncludejQueryMigrate = true; for all wikis (duration: 00m 03s)
22:41 logmsgbot: marktraceur Finished scap: Update Media Viewer preference string for wmf7 - already backported to wmf6 (duration: 13m 19s)
22:38 Krinkle: git-deploy: Deploying integration/slave-scripts If2e2e675802f
22:27 logmsgbot: marktraceur Started scap: Update Media Viewer preference string for wmf7 - already backported to wmf6
21:49 logmsgbot: marktraceur updated /a/common to I409703a11: Enable MMV by default on dewiki beta.
21:25 logmsgbot: marktraceur Synchronized mediaviewer.dblist: Enable media viewer by default on enwiki (duration: 00m 06s)
21:18 logmsgbot: marktraceur Synchronized wmf-config/InitialiseSettings.php: Throttle the MMV event logging a bit more for the launch today (duration: 00m 06s)
21:17 logmsgbot: marktraceur updated /a/common to I549906510: Launch Media Viewer for all users on English wikipedia
21:09 logmsgbot: marktraceur Synchronized wmf-config/InitialiseSettings.php: Touch InitialiseSettings.php because that's what we do (duration: 00m 06s)
21:08 logmsgbot: marktraceur Synchronized mediaviewer.dblist: Add dewiki to the on-by-default list for Media Viewer (duration: 00m 06s)
21:08 logmsgbot: marktraceur updated /a/common to Ie237b0ae1: Launch Media Viewer for all users on German wikipedia
20:51 MaxSem: Disabled GeoData updates on terbium
20:41 hashar: repack command: find /srv/ssd/gerrit/ -type d -name '*.git' -print -exec git --git-dir="{}" repack -afd \; -exec git --git-dir="{}" pack-refs --all \;
20:41 hashar: Jenkins repacking gerritslave replicas on gallium and lanthanum. Running in screen as hashar -> gerritslave
18:14 logmsgbot: reedy Synchronized wmf-config/: Stop sending IRC RC to PMTPA (duration: 00m 17s)
18:07 logmsgbot: reedy Synchronized docroot and w: (no message) (duration: 00m 14s)
18:05 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: All non wikipedias to 1.24wmf7
15:45 akosiaris: merged https://gerrit.wikimedia.org/r/#/c/133515/ which enabled ferm on hydrogen/chromium
15:41 logmsgbot: anomie Finished scap: SWAT: Update i18n for MultimediaViewer gerrit:136718 (duration: 17m 56s)
15:23 logmsgbot: anomie Started scap: SWAT: Update i18n for MultimediaViewer gerrit:136718
15:03 logmsgbot: anomie Synchronized wmf-config/InitialiseSettings.php: SWAT: Lower MediaViewer sampling for enwiki and dewiki gerrit:136717 (duration: 00m 14s)
13:05 paravoid: salt * start procps
11:13 _joe_: restarted jobrunners as they were blocked by restarting via cron
10:58 godog: try restarting mw-job-runner on mw1012
03:42 springle: revert to lvm snapshot on db1046, xfs being crotchety
03:17 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Jun 3 03:16:22 UTC 2014 (duration 16m 21s)
02:26 logmsgbot: LocalisationUpdate completed (1.24wmf7) at 2014-06-03 02:25:48+00:00
02:15 logmsgbot: LocalisationUpdate completed (1.24wmf6) at 2014-06-03 02:14:12+00:00
01:32 logmsgbot: reedy Synchronized wmf-config/CommonSettings.php: wgCentralAuthRC to EQIAD rc ircd (duration: 00m 14s)
00:28 awight: update crm from 5f6217d8f4d750087dcd37faca6b41de82d2362e to ded541894a70922e098fb3ea48306c8ec0f0f6aa

June 2

23:34 logmsgbot: maxsem Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/136428/ (duration: 00m 03s)
23:22 logmsgbot: maxsem Synchronized php-1.24wmf6/extensions/Flow/: (no message) (duration: 00m 04s)
23:21 logmsgbot: maxsem Synchronized php-1.24wmf6/extensions/VisualEditor/: (no message) (duration: 00m 03s)
23:20 logmsgbot: maxsem Synchronized php-1.24wmf7/extensions/VisualEditor/: (no message) (duration: 00m 04s)
23:18 logmsgbot: maxsem Synchronized php-1.24wmf7/extensions/Flow/: https://gerrit.wikimedia.org/r/#/c/136936/ (duration: 00m 05s)
22:49 Krinkle: Repooled integration-slave1003 in Jenkins.
22:37 Krinkle: Hack-patching integration-slave1003.eqiad.wmflabs per https://bugzilla.wikimedia.org/show_bug.cgi?id=61508#c2
21:30 mutante: searchidx1001 - low disk space, gzip MegaSAS.log, delete old kernel headers
21:18 awight: updated crm from b6e004f7349507523423c59170274150a44b0aaf to 5f6217d8f4d750087dcd37faca6b41de82d2362e
20:09 gwicke: deployed Parsoid 04a4bf2b
20:08 hashar: Jenkins unpolled integration-slave1003 npm is outdated there and does not trust npmregistry.org ( bug 61508 )
19:29 awight: updated crm from ce64066316e77f6fc3545c6265e2d81e3ef773c4 to b6e004f7349507523423c59170274150a44b0aaf
19:18 awight: update crm from 5b231163e9e880de5b9787d40b679a6723748aca to ce64066316e77f6fc3545c6265e2d81e3ef773c4
18:58 logmsgbot: csteipp Synchronized php-1.24wmf6/includes/upload/UploadBase.php: (no message) (duration: 00m 04s)
18:51 logmsgbot: csteipp Synchronized php-1.24wmf7/includes/upload/UploadBase.php: (no message) (duration: 00m 06s)
18:41 awight: updated tools from d257e8445e028b758b1d1fa90c857667d4faac62 to cbcd14a84f7bc8682822d3b1910b48bfd932b00d
17:15 chasemp: disabling ircd on ekrem
17:05 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: Cache search suggestions for 3 hours instead of 6 (duration: 00m 04s)
17:03 chasemp: moving irc.wikimedia.org to argon
16:27 ottomata: ran preferred-replica-election to fix vk delivery errors
16:24 logmsgbot: demon Synchronized wmf-config/throttle.php: Library of Israel editathon (duration: 00m 04s)
16:07 manybubbles: rebuilding all english non-wikipedias with unicode normalization
15:36 logmsgbot: manybubbles Synchronized wmf-config/: SWAT deploy - more import sources and upload domains (duration: 00m 04s)
15:34 manybubbles: reindexing all hebrew wikis to switch them from the hebrew analyzer to proper unicode normalization
15:33 ottomata: attempting to powercycle analytics1015, it is not responding to pings, no output on console
15:33 logmsgbot: manybubbles Synchronized wmf-config/: SWAT deploy changing some search settings (duration: 00m 05s)
15:26 hashar: restarted Zuul. All jobs lists :-(
15:25 hashar: Zuul stuck in a loop reporting a change :-(
15:20 hashar: Jenkins/Zuul stuck. Depooling/Repooling some slaves to reregister jobs with Zuul
14:51 ottomata: chown -R datasets /data/xmldatadumps/public/other/pagecounts-ez on dataset1001 to accompany 70a7f61, fixing bug 66005
12:44 akosiaris: manually ran puppet on mw11991
07:21 hashar: restarted Zuul unintentionally
03:12 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Jun 2 03:11:05 UTC 2014 (duration 11m 4s)
03:04 ori: ..on vanadium.
03:03 ori: moving /var/log/eventlogging/archive/* to /srv/eventlogging-logs to free up space on the root partition. unpuppetized for now, sadly.
02:25 logmsgbot: LocalisationUpdate completed (1.24wmf7) at 2014-06-02 02:23:57+00:00
02:13 logmsgbot: LocalisationUpdate completed (1.24wmf6) at 2014-06-02 02:12:53+00:00
02:06 logmsgbot: tstarling Synchronized php-1.24wmf6: Revert "Use square bounding boxes for default-sized thumbnails" (duration: 01m 18s)
02:02 logmsgbot: tstarling Synchronized php-1.24wmf7: (no message) (duration: 01m 31s)

June 1

05:41 awight: updated payments from 7c695e9c4c7386a7585b6067df29b8caaaa089f0 to e823354822c7a35e6c2069d3e72180a45dbc89dc
03:10 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Jun 1 03:09:28 UTC 2014 (duration 9m 27s)
02:25 logmsgbot: LocalisationUpdate completed (1.24wmf7) at 2014-06-01 02:23:58+00:00
02:14 logmsgbot: LocalisationUpdate completed (1.24wmf6) at 2014-06-01 02:13:23+00:00

May 31

03:14 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat May 31 03:13:47 UTC 2014 (duration 13m 46s)
02:27 logmsgbot: LocalisationUpdate completed (1.24wmf7) at 2014-05-31 02:26:20+00:00
02:16 logmsgbot: LocalisationUpdate completed (1.24wmf6) at 2014-05-31 02:15:48+00:00

May 30

22:00 greg-g: ori sync'd out a config change to Add $wgRCFeeds entries for RCStream on rcs100[12].eqiad.wmnet
21:57 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 03s)
21:21 hashar: Zuul restarted (somehow by mistake)
20:39 logmsgbot: bd808 Synchronized php-1.24wmf7/extensions/Elastica/Elastica/test/lib/Elastica/Test/IndexTest.php: Touched to test sync-file (duration: 00m 04s)
20:36 logmsgbot: bd808 Synchronized wmf-config/db-eqiad.php: Touched db-eqiad.php to test sync-file (duration: 00m 04s)
20:35 logmsgbot: bd808 Synchronized wmf-config/throttle.php: Touched to test sync-file (duration: 00m 04s)
20:35 bd808|deploy: Scap updated to 6c0c4f0
19:45 logmsgbot: bd808 Synchronized wmf-config/throttle.php: Touched to test sync-file (duration: 00m 05s)
19:40 logmsgbot: bd808 Synchronized wmf-config/db-eqiad.php: Touched db-eqiad.php to test sync-file (duration: 00m 03s)
19:33 logmsgbot: bd808 Synchronized README: Testing sync-file (duration: 00m 06s)
19:30 bd808|deploy: Scap updated to c4204dd
17:21 andrewbogott: restarted pdns on virt0 and virt1000
16:07 andrewbogott: changes the 'Ops' gid to 700 in ldap
16:02 akosiaris: enabled VT on thallium/mercury for ganeti evaluation purposes
14:54 _joe_: ran scap-rebuild-cdb on mw1163
13:47 hashar: Jenkins: removing label hasBrowserTests from labs slaves 136315
13:43 hashar: Jenkins: removing label hasHhvm from labs slaves 136315
13:42 hashar: Jenkins: removing label hasJenkinsDebianGlue from labs slaves 136315
13:26 hashar: Jenkins lowering number of executors on labs slave from 5 to 4 since they have 4 CPU
13:25 hashar: Jenkins polling a third CI slave integration-slave1003.
09:26 hashar_: Zuul is processing jobs again. For reference bug is bug 63760
09:24 hashar_: Jenkins: disconnecting and reconnecting labs slaves to reregister them with Zuu
09:17 hashar: Jenkins/Zuul locked
03:47 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri May 30 03:46:09 UTC 2014 (duration 46m 8s)
03:01 logmsgbot: LocalisationUpdate completed (1.24wmf7) at 2014-05-30 03:00:14+00:00
02:31 logmsgbot: LocalisationUpdate completed (1.24wmf6) at 2014-05-30 02:30:51+00:00

May 29

23:36 logmsgbot: csteipp synchronized php-1.24wmf6/includes/specials/SpecialPasswordReset.php
23:19 ^d: cleaned up /var/cache/apt on searchidx1001, freed up ~20% of the disk, should be fine now
23:09 logmsgbot: ori synchronized wmf-config/InitialiseSettings.php 'I09d92987: Flow-ify mw:Talk:Search'
23:07 logmsgbot: ori synchronized cirrus.dblist 'I3f09dff7e: All Wikipedias with 100k pages or less getting Cirrus as primary'
23:05 logmsgbot: ori updated /a/common to I3f09dff7e: All Wikipedias with 100k pages or less getting Cirrus as primary
20:29 logmsgbot: marktraceur synchronized wmf-config/InitialiseSettings.php 'Enable Media Viewer on all wikisources by default'
20:28 logmsgbot: marktraceur updated /a/common to I95348e0d4: Launch Media Viewer for all users on all Wikisources
20:25 logmsgbot: anomie synchronized php-1.24wmf6/includes/api 'Revert revert of gerrit:120827, underlying bug should be fixed now'
20:19 logmsgbot: anomie synchronized php-1.24wmf6/extensions/EducationProgram/includes/api/ApiListStudents.php 'Backport fix for bugzilla:65906'
20:05 logmsgbot: anomie synchronized php-1.24wmf7/extensions/EducationProgram/includes/api/ApiListStudents.php 'Backport fix for bugzilla:65906'
19:40 ^d: hewiki elastic index was missing geodata mappings. re-map + in place reindex failed spectacularly. rebuilding from scratch now.
19:38 ori: mw1163: mkdir -p /usr/local/apache/common-local && chown mwdeploy:mwdeploy /usr/local/apache/common-local
19:09 cmjohnson1: powering down mw1151 for disk replacement
19:01 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.24wmf7
18:49 cmjohnson1: removing mw1151 from pybal and dsh groups to replace disk and reinstall
18:47 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikipedias to 1.24wmf6 take 3
18:44 logmsgbot: reedy synchronized php-1.24wmf6/includes/api/
18:19 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: rv that
18:18 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikipedias to 1.24wmf6
15:49 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: (no message)
15:42 logmsgbot: reedy Finished scap: testwiki to 1.24wmf7 and build l10n cache (duration: 24m 24s)
15:17 logmsgbot: reedy Started scap: testwiki to 1.24wmf7 and build l10n cache
15:06 logmsgbot: anomie synchronized php-1.24wmf6/extensions/VisualEditor/modules/ve-mw/ 'SWAT: VisualEditor URL decoding and image alignment fixes. gerrit:135922 gerrit:135946'
04:26 logmsgbot: springle synchronized README 'test sync-file'
04:20 ori: updated scap to 9ba9014: Partially revert "Convert sync-dir and sync-file to python"
04:13 ori: re-enabled puppet on tin
03:59 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu May 29 03:57:59 UTC 2014 (duration 57m 58s)
03:55 logmsgbot: ori Synchronized README: Debugging sync-file (duration: 00m 06s)
03:51 springle: db1009 mariadb 5.5.37 live trial with low load
03:49 logmsgbot: springle synchronized wmf-config/db-eqiad.php 'repool db1009 in s2, take #2'
03:45 ori: disabled puppet on tin and copied sync-common-file from mediawiki/tools/scap@8f2a8356c38 into /usr/local/bin to debug sync issue
03:44 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1009 in s2 (duration: 00m 08s)
03:11 logmsgbot: LocalisationUpdate completed (1.24wmf6) at 2014-05-29 03:10:15+00:00
02:36 logmsgbot: LocalisationUpdate completed (1.24wmf5) at 2014-05-29 02:35:45+00:00
01:22 awight: updated SmashPig from e9964bfec47b3796dab0a19a9545cc3abb23fde6 to 98b1f348aa55f6a3aac441db08a59ca309fade7a
01:16 awight: (rollback)
01:16 awight: updated SmashPig from 03015f3827fedea9d0f89c791604ad08ec97ba71 to e9964bfec47b3796dab0a19a9545cc3abb23fde6
01:04 awight: update SmashPig from f64f79f13cf4ab560d0bb5bd69690c827a821629 to 03015f3827fedea9d0f89c791604ad08ec97ba71

May 28

23:39 logmsgbot: demon Synchronized wmf-config/CommonSettings.php: Including external CologneBlue/Modern skins, if they exist (duration: 00m 07s)
22:27 awight: updated crm from 65a433b5564f42c3aa4f310cd4bb938ae70f841d to 5b231163e9e880de5b9787d40b679a6723748aca
22:27 awight: updated tools from 1e8029544dc19a84f6d1adf2783266e16d19ef1f to d257e8445e028b758b1d1fa90c857667d4faac62
21:23 RoanKattouw: Restarting Parsoid Varnishes per gwicke's request
20:38 mutante: enabling puppet on osmium
20:18 hashar: Jenkins: killed all phantomjs process on gallium. They were eating all available memory. All three process were VisualEditor qunit tests.
20:13 subbu: deployed parsoid a234af8c0 (deploy sha f17506eb)
20:12 hashar: gallium (Jenkins master) sent to swap somehow :-(
20:02 logmsgbot: bd808 Finished scap: no-op scap deleted.dblist (duration: 10m 40s)
19:52 bd808|deploy: Horrible log message; should be "no-op scap to test code changes"
19:51 logmsgbot: bd808 Started scap: no-op scap deleted.dblist
19:49 logmsgbot: bd808 Synchronized database lists: (no message) (duration: 00m 03s)
19:48 logmsgbot: bd808 Synchronized robots.txt: Testing sync-file in php (duration: 00m 03s)
19:08 bd808|deploy: Symlinks for mergeCdbFileUpdates, mwversionsinuse, refreshCdbJsonFiles, scap-rebuild-cdbs, scap-recompile and sync-common on tin still pointing to /srv/scap/bin instead of /srv/deployment/scap/scap/bin
19:06 robh: mw1053 reinstalling
18:54 logmsgbot: bd808 Synchronized robots.txt: Testing sync-file in php (duration: 00m 05s)
18:28 mutante: running puppet on jobrunners
18:27 ^d: jobrunners back up now, should slowly catch back up
18:24 bd808|deploy: Scap updated to fd7e538; Trebuchet fetch and checkout failed for mw1053.eqiad.wmnet
18:06 bd808|deploy: Restarted logstash on logstash1001; log event volume suspiciously low for the last ~35 minutes
17:59 ^d: all job runners halted at 17:39? graphite shows no jobs being run, runJobs on fluorine also has nothing since the timestamp.
17:41 logmsgbot: yurik synchronized php-1.24wmf6/extensions/ZeroRatedMobileAccess/
17:37 logmsgbot: yurik synchronized php-1.24wmf5/extensions/ZeroRatedMobileAccess/
17:32 mwalker: enabling worldpay in BE (payments from 5136b0b6852f3e949e4dc847f7137f1b7bc3037b to 7c695e9c4c7386a7585b6067df29b8caaaa089f0)
16:47 hashar: Jenkins/Zuul back. Jobs meant to be run on labs instances ended up not being registered anymore with the Zuul Gearman server. That must be a bug in the Jenkins Gearman plugin :-( bug 63760
16:31 hashar: Jenkins / Zuul locked. Looking into it
16:29 _joe_: restarted mwprof/profiler-to-carbon
15:09 logmsgbot: anomie synchronized php-1.24wmf6/extensions/Wikidata 'SWAT: Fix issue with Wikidata rollback gerrit:135767'
13:52 logmsgbot: reedy synchronized wmf-config/CommonSettings.php
13:26 logmsgbot: reedy synchronized wmf-config/ 'Enable REL1_23 in ExtensionDistributor'
13:24 manybubbles: restarting elastic1001 to revert it back to mmapfs - niofs wasn't better. worse, even.
11:23 manybubbles: restarting elastic1001 to try out niofs (instead of mmapfs) on advice from a lucene developer
03:25 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed May 28 03:24:24 UTC 2014 (duration 24m 23s)
02:25 logmsgbot: LocalisationUpdate completed (1.24wmf6) at 2014-05-28 02:24:55+00:00
02:14 logmsgbot: LocalisationUpdate completed (1.24wmf5) at 2014-05-28 02:13:02+00:00

May 27

23:06 logmsgbot: ori synchronized wmf-config/InitialiseSettings.php 'I7fdafede7: Disable the anonymous signup invite experiment'
22:00 mutante: caesium, release files: changed file owner groups mwupld->releasers-mediawiki, mobileupld->releasers-mobile (to match switch to yaml groups)
18:39 logmsgbot: reedy synchronized docroot and w
18:38 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non Wikipedias to 1.24wmf6
18:31 logmsgbot: reedy synchronized php-1.24wmf6/includes/SkinTemplate.php
18:23 logmsgbot: demon synchronized wmf-config/InitialiseSettings.php
17:46 logmsgbot: aude synchronized php-1.24wmf6/extensions/Wikidata 'JS fixes for Wikidata'
17:29 mutante: welcome new deployer cscott
16:04 gwicke: restarted parsoids after another surge in load
15:48 logmsgbot: anomie synchronized wmf-config/InitialiseSettings.php 'SWAT: move-categorypages permission changes on fawiki gerrit:135426'
15:42 logmsgbot: anomie synchronized php-1.24wmf5/extensions/UniversalLanguageSelector/resources/ 'SWAT: Update ULS to fix beta feature gerrit:135535'
15:40 logmsgbot: anomie synchronized php-1.24wmf6/extensions/UniversalLanguageSelector/resources/ 'SWAT: Update ULS to fix beta feature gerrit:135310'
15:32 logmsgbot: anomie synchronized php-1.24wmf6/includes/Title.php 'SWAT: Check correct message in category moving gerrit:135211'
15:27 logmsgbot: anomie synchronized php-1.24wmf5/includes/Title.php 'SWAT: Check correct message in category moving gerrit:135210
15:21 logmsgbot: anomie synchronized php-1.24wmf6/includes/HistoryBlob.php 'SWAT: Revert another visibility change that causes errors bugzilla:65665 gerrit:135574'
15:15 logmsgbot: anomie synchronized php-1.24wmf6/includes/revisiondelete/ 'SWAT: Revert another visibility change that causes fatal errors bugzilla:65733 gerrit:135389'
15:13 logmsgbot: anomie synchronized php-1.24wmf5/includes/revisiondelete/ 'SWAT: Revert another visibility change that causes fatal errors bugzilla:65733 gerrit:135388'
15:05 logmsgbot: anomie synchronized php-1.24wmf6/extensions/VisualEditor/modules/ve-mw/ 'SWAT: Fix for VisualEditor image alignment regression gerrit:135171'
12:19 Reedy: Created SecurePoll tables on zerowiki, legalteamwiki, zhwikivoyage, viwikivoyage, tyvwiki
11:40 godog: restart apache2 on tungsten, many report.py hung
05:42 gwicke: restarted parsoids after load surge
03:13 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue May 27 03:12:29 UTC 2014 (duration 12m 28s)
02:25 logmsgbot: LocalisationUpdate completed (1.24wmf6) at 2014-05-27 02:24:51+00:00
02:14 logmsgbot: LocalisationUpdate completed (1.24wmf5) at 2014-05-27 02:13:08+00:00
01:39 springle: starting updateCollation on s2 cs-wiki from tin
01:36 logmsgbot: springle synchronized wmf-config/InitialiseSettings.php '$wgCategoryCollation to uca-cs on cswiki'

May 26

09:17 hashar: bugzilla.bugs_fulltext bug was bug 65762
09:16 _joe_: repaired table bugzilla.bugs_fulltext on db1001 as it was marked as crashed
03:11 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon May 26 03:10:02 UTC 2014 (duration 10m 1s)
02:25 logmsgbot: LocalisationUpdate completed (1.24wmf6) at 2014-05-26 02:24:39+00:00
02:14 logmsgbot: LocalisationUpdate completed (1.24wmf5) at 2014-05-26 02:13:01+00:00

May 25

03:09 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun May 25 03:08:47 UTC 2014 (duration 8m 46s)
02:25 logmsgbot: LocalisationUpdate completed (1.24wmf6) at 2014-05-25 02:24:10+00:00
02:14 logmsgbot: LocalisationUpdate completed (1.24wmf5) at 2014-05-25 02:13:47+00:00

May 24

03:02 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat May 24 03:01:50 UTC 2014 (duration 1m 49s)
02:21 logmsgbot: LocalisationUpdate completed (1.24wmf6) at 2014-05-24 02:20:11+00:00
02:13 logmsgbot: LocalisationUpdate completed (1.24wmf5) at 2014-05-24 02:12:25+00:00
00:34 mutante: fixing Aaron's and Ariel's file permissions on fenari

May 23

19:43 logmsgbot: anomie synchronized php-1.24wmf6/includes/HistoryBlob.php 'Backport fix for bug 65665 to 1.24wmf6 gerrit:135089'
19:24 awight: updated fr-tools from 73921d4b4a7ba69b703340ed56e513f8ae8e0bb5 to 1e8029544dc19a84f6d1adf2783266e16d19ef1f
18:37 mwalker: updated paymnets wiki from d99177518b741e7fe18ffda86c83f93c72e164a6 for worldpay
18:22 Jeff_Green: ran authdns-update to merge new wikimedia.community dns zone
17:02 bd808: Starting rolling update of elasticsearch for logstash cluster
16:20 bd808: restarted elasticsearch on logstash1002
16:17 bd808: Elasticsearch on logstash1002 dead due to OOM at 2014-05-23T00:34:03Z
14:52 hashar: killed -9 a remaining Jenkins process
14:21 _joe_: killed zuul server, as was stuck
13:50 _joe_: killed & started jenkins, jvm stuck, unresponsive to jstack
13:17 manybubbles: resarting jenkins because it seems stuck
11:05 mark: Setup BFD on Zayo link between cr2-ulsfo and cr1-eqiad
11:01 mark: Setup BFD on GTT link between cr1-ulsfo and cr2-eqiad
07:30 _joe_: powercycling ms-be1007, unresponsive, console blank, no way to debug
04:17 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri May 23 04:16:15 UTC 2014 (duration 16m 14s)
03:23 logmsgbot: LocalisationUpdate completed (1.24wmf6) at 2014-05-23 03:22:07+00:00
02:39 logmsgbot: LocalisationUpdate completed (1.24wmf5) at 2014-05-23 02:38:33+00:00

May 22

23:22 mutante: osmium,mw1151 fixed UID of mwalker (605->2454)
23:16 bd808: Ran sync-common manually on osmium and mw1151
23:14 mwalker: sync-dir failed for osmium and mw1151
23:14 logmsgbot: mwalker synchronized php-1.24wmf6/extensions/VisualEditor 'Syncing the extension manually because of scap failures on osium, mw1010, mw1070, mw1161, mw1201, and mw1151'
23:11 logmsgbot: mwalker Finished scap: SWAT Update to VisualEditor 134941 (duration: 03m 04s)
23:08 logmsgbot: mwalker Started scap: SWAT Update to VisualEditor 134941
21:43 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Ifd048bafe0eb4af8765cee20a3d93d7663b1bcdf'
21:33 logmsgbot: reedy synchronized multiversion/MWMultiVersion.php
21:29 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Iecdd8c5e60a142363b40e34d4fe2f27f0e5feef5'
21:22 logmsgbot: demon synchronized wmf-config/InitialiseSettings.php 'touching'
21:21 logmsgbot: demon synchronized wmf-config/flaggedrevs.php 'Removing FR from mw.org'
21:19 logmsgbot: demon synchronized flaggedrevs.dblist 'Removing FR from mw.org'
20:10 logmsgbot: marktraceur synchronized wmf-config/InitialiseSettings.php 'Sync for mediaviewer.dblist change'
20:06 logmsgbot: marktraceur synchronized mediaviewer.dblist 'Enabling Media Viewer on itwiki and ruwiki by default'
20:04 logmsgbot: marktraceur updated /a/common to I1c658bf65: Remove VE formula editor from BF whitelist (graduated)
19:54 logmsgbot: reedy Purged l10n cache for 1.24wmf4
19:54 logmsgbot: reedy Purged l10n cache for 1.24wmf3
19:54 logmsgbot: reedy Purged l10n cache for 1.24wmf2
19:53 logmsgbot: reedy Purged l10n cache for 1.24wmf1
19:53 logmsgbot: reedy Purged l10n cache for 1.23wmf22
19:52 logmsgbot: reedy Purged l10n cache for 1.23wmf21
19:38 paravoid: cr1/2-ulsfo: BGP peering with AS11820 (WMF Corp HQ)
19:28 Reedy: Ran patch-fr_page_rev-index.sql patch on fawiki
19:27 Reedy: Created flaggedrevs_statistics table on fawiki
19:04 logmsgbot: reedy synchronized wmf-config/
18:30 logmsgbot: reedy synchronized wmf-config/ 'I7a02f2615d98428b6f27514e75d935d36e44fcb1'
18:22 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.24wmf6
18:04 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikipedias to 1.24wmf5
17:17 mutante: powercycling ms-be1012
17:05 logmsgbot: reedy Finished scap: testwiki to 1.24wmf6 and build l10n cache (duration: 28m 31s)
16:37 logmsgbot: reedy Started scap: testwiki to 1.24wmf6 and build l10n cache
16:30 mutante: maerlant - it was done for ~8d, old test host that didn't really do anything, revoked salt/pupppet certs, removing from Icinga..
15:52 logmsgbot: anomie synchronized wmf-config/InitialiseSettings.php 'SWAT: Fix typo in MultimediaViewer logging config gerrit:134837'
15:31 logmsgbot: anomie synchronized php-1.24wmf5/extensions/MultimediaViewer/ 'SWAT: Deploy new MultimediaViewer logging to wmf5 wikis gerrit:134804'
15:22 logmsgbot: anomie synchronized wmf-config/CommonSettings.php 'SWAT: Disable old MultimediaViewer logging and pre-enable new logging gerrit:134343'
15:21 logmsgbot: anomie synchronized wmf-config/InitialiseSettings.php 'SWAT: Disable old MultimediaViewer logging and pre-enable new logging gerrit:134343'
15:13 logmsgbot: anomie synchronized php-1.24wmf5/extensions/MultimediaViewer/tests/qunit/mmv/ui/ 'SWAT: Fix qunit tests for MultimediaViewer gerrit:134807'
14:31 paravoid: pushing new swift rings
03:38 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu May 22 03:37:31 UTC 2014 (duration 37m 30s)
02:39 logmsgbot: LocalisationUpdate completed (1.24wmf5) at 2014-05-22 02:38:18+00:00
02:15 logmsgbot: LocalisationUpdate completed (1.24wmf4) at 2014-05-22 02:14:49+00:00

May 21

23:28 logmsgbot: maxsem synchronized php-1.24wmf5/extensions/MultimediaViewer 'touch'
23:22 logmsgbot: maxsem synchronized php-1.24wmf5/extensions/MultimediaViewer/ 'https://gerrit.wikimedia.org/r/#/c/134750/'
23:16 logmsgbot: maxsem synchronized php-1.24wmf5/extensions/Flow 'https://gerrit.wikimedia.org/r/#/c/134746/'
22:18 Krinkle: Running deleteEqualMessages.php on guwiki (bug 43917)
21:03 logmsgbot: demon synchronized wmf-config/InitialiseSettings.php 'GeoData using Elasticsearch everywhere'
20:07 subbu: deployed Parsoid 95929801b (deploy sha ae83633a)
19:56 awight: updated tools from c1f50f6909b04768f3a8faa50b25e88a43f89606 to 73921d4b4a7ba69b703340ed56e513f8ae8e0bb5
19:05 mutante: welcome new deployer tgr
18:22 andrewbogott: restarting gerrit service
15:47 logmsgbot: anomie synchronized php-1.24wmf5/tests/qunit/suites/resources/mediawiki/mediawiki.user.test.js 'May as well sync this too'
15:45 logmsgbot: anomie synchronized php-1.24wmf5/resources/src/mediawiki/mediawiki.user.js 'SWAT: Use mw.log.deprecate to track user() and anonymous()'
15:28 logmsgbot: anomie synchronized php-1.24wmf5/includes/filerepo/file/LocalFile.php 'SWAT: Tweaked timestamp logic in recordUpload2'
15:19 logmsgbot: anomie synchronized php-1.24wmf5/includes/filerepo/file/LocalFile.php 'SWAT: Replace FOR UPDATE with LockManager use in LocalFile::lock()'
15:18 logmsgbot: anomie synchronized php-1.24wmf5/includes/filebackend/FileBackend.php 'SWAT: Replace FOR UPDATE with LockManager use in LocalFile::lock()'
14:11 logmsgbot: springle synchronized wmf-config/db-eqiad.php 'raise db1068 to normal load'
12:30 hashar: Jenkins: updated sysadmin email address from nobody@integration.wikimedia.org to jenkins-bot@wikimedia.org
12:22 logmsgbot: springle synchronized wmf-config/db-eqiad.php 'pool db1068 in s4, warm up'
07:25 awight: updated tools from d2437564c56881f6b879403f2f6f2f554b6b0391 to c1f50f6909b04768f3a8faa50b25e88a43f89606
07:17 awight: updated tools from ee31fc94b17c11a48ddac19aabfcdaab69fd2f72 to d2437564c56881f6b879403f2f6f2f554b6b0391
03:42 springle: resume xtrabackup db1049 to db1068, throttled
03:13 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed May 21 03:12:31 UTC 2014 (duration 12m 30s)
03:13 mutante: merging Change-Id: I2827d1ef347 and starting icinga fixed it
03:09 springle: killed db1068 xtrabackup, saturating db1064 network
03:01 mutante: icinga broken on neon due to missing servicegroup 'analytics_eqiad'
02:30 logmsgbot: LocalisationUpdate completed (1.24wmf5) at 2014-05-21 02:29:03+00:00
02:19 springle: xtrabackup clone db1049 to db1068
02:18 logmsgbot: springle synchronized wmf-config/db-eqiad.php 's4 reduce db1049 load while cloning'
02:16 logmsgbot: LocalisationUpdate completed (1.24wmf4) at 2014-05-21 02:15:36+00:00
02:04 logmsgbot: springle synchronized wmf-config/db-eqiad.php 's4 raise db1056 to normal load, depool db1011'

May 20

23:53 logmsgbot: maxsem synchronized php-1.24wmf4/extensions/MobileFrontend/ 'https://gerrit.wikimedia.org/r/#/c/134504/'
23:52 logmsgbot: maxsem synchronized php-1.24wmf5/extensions/MobileFrontend/ 'https://gerrit.wikimedia.org/r/#/c/134504/'
23:40 logmsgbot: maxsem synchronized php-1.24wmf5/extensions/MobileFrontend/ 'https://gerrit.wikimedia.org/r/134517'
23:39 logmsgbot: maxsem synchronized php-1.24wmf4/extensions/MobileFrontend/ 'https://gerrit.wikimedia.org/r/134517'
23:14 logmsgbot: maxsem synchronized php-1.24wmf4/extensions/MobileFrontend 'https://gerrit.wikimedia.org/r/#/c/134405/'
23:11 logmsgbot: maxsem synchronized php-1.24wmf5/extensions/MobileFrontend 'https://gerrit.wikimedia.org/r/#/c/134405/'
22:00 logmsgbot: aaron synchronized wmf-config/filebackend.php '69201b4caf703ef1ab52b38be29c80b4e939fdc2 - no-op'
21:57 logmsgbot: aaron synchronized wmf-config/filebackend.php 'Removed old tampa config'
21:41 logmsgbot: bsitu synchronized wmf-config/InitialiseSettings.php 'Re-enable flow on mediawiki:Talk:Design'
21:39 logmsgbot: bsitu updated /a/common to I037cd0a42: Re-enable flow on Talk:Design ( Removed LQT code )
21:30 logmsgbot: bsitu synchronized wmf-config/InitialiseSettings.php 'Disable flow on mediawiki:Talk:Design'
21:28 logmsgbot: bsitu updated /a/common to Idde23abd3: Undo "enable flow on Talk:Design"
21:11 logmsgbot: bsitu synchronized wmf-config/InitialiseSettings.php 'Enable Flow on 3 mediawiki talk pages'
21:08 logmsgbot: bsitu updated /a/common to I549967ca2: Group1 wikis to 1.24wmf5
20:35 Krinkle: Reload zuul to deploy I80496db747a8668be
19:14 hoo: fixed Wikidata for php-1.24wmf5 on mw1138 by manually removing it and then running sync-common
19:09 bd808: ran sync-common on mw1138
19:02 bd808: Updated scap to 7b6fc47
18:44 logmsgbot: aude synchronized php-1.24wmf5/extensions/Wikidata 'Fix jquery error tooltip issue'
18:33 bd808|deploy: Gave up on updating scap with trebuchet
18:32 logmsgbot: bd808 rebuilt wikiversions.cdb and synchronized wikiversions files: group1 to 1.24wmf5
18:22 RobH: restarting salt minions on mw servers
18:12 bd808|deploy: `git deploy sync` for scap ended with "0/230 minions completed fetch"
15:15 andrewbogott: running salt "ms-be*" cmd.run "kill $(ps aux | grep 'find / -user' | awk '{print $2}')" to kill runaway 'finds' on swifts
14:47 andrewbogott: running 'find' commands on many hosts to chown files for users with new UIDs.
13:20 Krinkle: git-deploy: Deploying integration/slave-scripts I4a4e2a4c90fb6
03:38 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue May 20 03:37:32 UTC 2014 (duration 37m 31s)
02:34 logmsgbot: LocalisationUpdate completed (1.24wmf5) at 2014-05-20 02:33:31+00:00
02:20 logmsgbot: LocalisationUpdate completed (1.24wmf4) at 2014-05-20 02:19:33+00:00

May 19

23:37 logmsgbot: catrope synchronized wmf-config/InitialiseSettings.php 'autopatrolled group on dewikivoyage'
23:26 logmsgbot: catrope synchronized php-1.24wmf5/extensions/VisualEditor
23:25 logmsgbot: catrope synchronized php-1.24wmf5/extensions/MobileFrontend
23:24 logmsgbot: catrope synchronized php-1.24wmf5/extensions/Wikidata
22:19 andrewbogott: Coren restarted opendj and I restarted pdns on virt1000. Opendj was refusing connections for unclear reasons
21:34 Krinkle: Running deleteEqualMessages.php on urwiki (bug 43917)
21:33 Krinkle: Running deleteEqualMessages.php on commonswiki (bug 43917)
20:53 superm401: sync-l10nupdate-1 1.24wmf4 had one error on mw1218
20:27 gwicke: updated Parsoid to 3ac048d7c4b
20:05 csteipp: fix deployed for bug 65501
20:00 Krinkle: Running deleteEqualMessages.php on fowiki (bug 43917)
20:00 Krinkle: Running deleteEqualMessages.php on enwikinews (bug 43917)
19:59 Krinkle: Reloading zuul to deploy I0b8051074da39edcac
19:16 bd808: Added display of exception-json events to fatalmonitor logstash dashboard
19:01 logmsgbot: bd808 Purged l10n cache for 1.24wmf3
19:00 logmsgbot: bd808 Purged l10n cache for 1.23wmf22
18:59 logmsgbot: bd808 Purged l10n cache for 1.23wmf21
18:59 logmsgbot: bd808 Purged l10n cache for 1.23wmf21
18:46 logmsgbot: reedy Finished scap: nooop to test for errors (duration: 02m 45s)
18:44 logmsgbot: reedy Started scap: nooop to test for errors
18:43 Reedy: rm -rf /usr/local/apache/common-local/php-1.23wmf20 against all apaches
18:37 Reedy: Ran sync-common locally on mw1015
18:32 greg-g: mw1010.eqiad.wmnet::common for sync-common, not sure for cdb. (sez superm401)
18:23 superm401: 1 server failed for sync-common. 2 servers failed for sync-rebuild-cdbs
18:18 logmsgbot: mattflaschen Finished scap: Deploy GettingStarted and enable experiment for de, en, fr, and it (duration: 18m 53s)
17:59 logmsgbot: mattflaschen Started scap: Deploy GettingStarted and enable experiment for de, en, fr, and it
15:26 logmsgbot: manybubbles synchronized php-1.24wmf5/resources/lib/oojs-ui/ 'fix panellayout'
15:18 logmsgbot: manybubbles synchronized php-1.24wmf4/extensions/CirrusSearch/ 'adding url parameter to suppress snippets and one to suggest suggestions to cirrus'
15:12 manybubbles: SWAT deployed cirrus update for wmf5 and looks good. doing for wmf4 now.
15:10 logmsgbot: manybubbles synchronized php-1.24wmf5/extensions/CirrusSearch/
14:53 logmsgbot: manybubbles synchronized wmf-config/InitialiseSettings.php 'touched and synced InitializeSettings.php to make update to cirrus.dblist take hold - resyncing to mw1171'
14:51 _joe_: powercycled mw1171, dead and serial console stuck
14:51 logmsgbot: manybubbles synchronized cirrus.dblist 'Switch cirrus to the primary backend for zh-yue wikipedia - resyncing to mw1171'
14:40 logmsgbot: manybubbles synchronized wmf-config/InitialiseSettings.php 'touched and synced InitializeSettings.php to make update to cirrus.dblist take hold'
14:38 logmsgbot: manybubbles synchronized cirrus.dblist 'Switch cirrus to the primary backend for zh-yue wikipedia'
13:34 Krinkle: Running deleteEqualMessages.php on mtwiki (bug 43917)
13:17 Krinkle: Running deleteEqualMessages.php on zh_min_nanwiki (bug 43917)
13:07 Krinkle: Running deleteEqualMessages.php on zh_yuewiki (bug 43917)
12:01 Krinkle: Running deleteEqualMessages.php on suwiki (bug 43917)
03:10 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon May 19 03:09:00 UTC 2014 (duration 8m 59s)
02:26 logmsgbot: LocalisationUpdate completed (1.24wmf5) at 2014-05-19 02:25:51+00:00
02:14 logmsgbot: LocalisationUpdate completed (1.24wmf4) at 2014-05-19 02:13:37+00:00
00:30 Tim: on osmium: stopping job runners in order to fix cgroup permissions issue

May 18

03:07 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun May 18 03:06:03 UTC 2014 (duration 6m 2s)
02:24 logmsgbot: LocalisationUpdate completed (1.24wmf5) at 2014-05-18 02:23:29+00:00
02:13 logmsgbot: LocalisationUpdate completed (1.24wmf4) at 2014-05-18 02:12:46+00:00

May 17

03:08 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat May 17 03:07:12 UTC 2014 (duration 7m 11s)
02:25 logmsgbot: LocalisationUpdate completed (1.24wmf5) at 2014-05-17 02:24:46+00:00
02:15 logmsgbot: LocalisationUpdate completed (1.24wmf4) at 2014-05-17 02:14:13+00:00
01:58 mutante: powercycling labsdb1003
00:23 ori: varnishadm on cp1056 confirms that varnish recognizes mw1151 as "sick"
00:20 ori: stopping apache and disabling puppet on mw1151 so that varnish stops forwarding reqs to it
00:16 Krinkle: On mw1151, Gadget::loadStructuredList() returns false, memcached has no value for 'enwiki:gadgets-definition:7' and is unable to store it.
00:11 ori: Krinkle identified weird RL responses as all originating in mw1151; dmesg shows ata1 disk troubles: "failed command: READ DMA EXT", "sd 0:0:0:0: [sda] Add. Sense: Unrecovered read error - auto reallocate failed"
00:04 logmsgbot: krinkle synchronized php-1.24wmf4/includes/resourceloader/ResourceLoader.php 'I718fcf23d'

May 16

23:07 awight: updated crm from 0b8aa8aa046935b6cfc67c10ebe10396d5e42745 to 65a433b5564f42c3aa4f310cd4bb938ae70f841d
22:56 logmsgbot: aaron synchronized php-1.24wmf4/includes/db/Database.php '182e42c173b9ab0c2bc5d753879a000b1ff39e77'
22:54 logmsgbot: aaron synchronized php-1.24wmf5/includes/db/Database.php '8829ffc72d3332d348a1a2e58d525e54e126bad5'
22:17 awight: tools updated from 85bb7293d83517086e3609f03365aecde9f58c71 to ee31fc94b17c11a48ddac19aabfcdaab69fd2f72
21:37 logmsgbot: ori synchronized php-1.24wmf4/extensions/MultimediaViewer 'Update MultimediaViewer for I0df067a61: Add sampling to unsampled event logging'
21:33 logmsgbot: ori synchronized php-1.24wmf5/extensions/MultimediaViewer 'Update MultimediaViewer for I0df067a61: Add sampling to unsampled event logging'
21:10 mwalker: updating fundraising smashpig from 2fdf982b20f1cbeaf9f57af64ef21b5b69a36f6e to f64f79f13cf4ab560d0bb5bd69690c827a821629
20:41 awight: update crm from 243641de631b712c4a29ca1f3618771b78dadeae to 0b8aa8aa046935b6cfc67c10ebe10396d5e42745
18:43 awight: update tools from a40c0caa18a0efd93bc5d3f7f68386fbc36bf1fa to 85bb7293d83517086e3609f03365aecde9f58c71
18:12 logmsgbot: ori synchronized wmf-config/InitialiseSettings.php 'I3c453b0949f4e: Tweak MediaViewer sampling settings'
18:07 logmsgbot: ori synchronized wmf-config 'Ia43821231: Add sampling control setting for MediaViewer event'
18:04 logmsgbot: ori updated /a/common to Ia43821231: Add sampling control setting for MediaViewer event logging
17:47 logmsgbot: demon synchronized wmf-config/CommonSettings.php 'GeoData to Elastic for all wikivoyages'
17:11 mwalker: updating staging payments servers as well from 5e24b953dcff5305099e152139e6e93daba8aeec to d99177518b741e7fe18ffda86c83f93c72e164a6
17:10 mwalker: and updated to 1.22.6
17:10 mwalker: moved fundraising wiki from 6a1d4983319038edeb88dc34a1c220ecaec1cbde to d99177518b741e7fe18ffda86c83f93c72e164a6 -- including json i18n changes
16:55 manybubbles: "in place" reindexing (for cirrus) all the wikipedias after the deploy train hit them yesterday
16:53 logmsgbot: demon synchronized wmf-config/CommonSettings.php 'Removing old WikiEditor settings'
16:44 RobH: partial zirconium downtime
16:44 RobH: i logged into zirconium, but it had recovered by the time I checked it.
16:33 mwalker: updated fundraising civicrm from 7a23465e620211739421cce3ad57c62597eb8cc3 to 75c1a50b8aa7e7b6f218d7c420932a8fc53a0a34 for an exchange rates fix
16:26 qchris: updated gerrit's hooks-bugzilla plugin to version 2.8.1.2 to allow talking to bugzilla-4.4.4
13:54 logmsgbot: springle synchronized wmf-config/db-eqiad.php 'repool db1056 in s4, warm up'
13:17 logmsgbot: springle synchronized wmf-config/db-eqiad.php 'raise db1070 and db1071 to normal load'
10:14 springle: xtrabackup clone db1049 to db1056
10:13 logmsgbot: springle synchronized wmf-config/db-eqiad.php 'reduce db1049 load while cloning'
09:40 logmsgbot: springle synchronized wmf-config/db-eqiad.php 'pool db1070 and db1071 in s1, warm up'
06:20 logmsgbot: ori synchronized php-1.24wmf4/maintenance/compareParserCache.php 'Ica69a3ef2: Added a script to compare current parser output to cache (no impact on prod; syncing for consistency)'
03:55 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri May 16 03:54:03 UTC 2014 (duration 54m 2s)
03:09 logmsgbot: LocalisationUpdate completed (1.24wmf5) at 2014-05-16 03:08:04+00:00
02:39 logmsgbot: LocalisationUpdate completed (1.24wmf4) at 2014-05-16 02:38:41+00:00
02:07 springle: xtrabackup db1070 to db1071
01:02 logmsgbot: ori synchronized php-1.24wmf5/includes/parser 'I12a60b5cc: Revert "Declare visibility on class properties of includes/parser/"'
00:54 hoo: rebuildItemsPerSite finished running for Wikidata (after about 30h).
00:32 hoo: manually ran rebuildEntityPerPage for Wikidata to fix 2 broken records
00:07 logmsgbot: maxsem synchronized wmf-config 'https://gerrit.wikimedia.org/r/#/c/131762/'
00:00 logmsgbot: ori synchronized wmf-config/squid.php 'Id188979c1: Use whole subnets in squid.php list for XFF acceptance'

May 15

23:57 logmsgbot: ori updated /a/common to Id188979c1: Use whole subnets in squid.php list for XFF acceptance
23:38 logmsgbot: ori synchronized php-1.24wmf4/includes 'Ia3b12fb9: Speed up CIDR matching from $wgSquidServersNoPurge'
23:19 logmsgbot: ori synchronized php-1.24wmf5/includes 'Ia3b12fb9: Speed up CIDR matching from $wgSquidServersNoPurge'
23:05 logmsgbot: ori synchronized wmf-config/CirrusSearch-production.php 'Iae07852b1: Elasticsearch plugin juggling'
22:54 logmsgbot: ori synchronized wmf-config 'I51a55c4e2, Ia6c01a913, I594848ce0, and I594848ce0'
22:50 logmsgbot: ori updated /a/common to Ifae836de5: Swapping GeoData backend for enwikivoyage
22:28 logmsgbot: ori synchronized php-1.24wmf4/extensions/EventLogging 'Update EventLogging to master for I89819bd943'
22:26 logmsgbot: ori synchronized php-1.24wmf5/extensions/EventLogging 'Update EventLogging to master for I89819bd943'
22:06 awight: updated tools from 93fda5da99674eca221e0abf53ad499583b27cfb to a40c0caa18a0efd93bc5d3f7f68386fbc36bf1fa
22:04 logmsgbot: demon synchronized wmf-config/InitialiseSettings.php 'GeoData using elasticsearch on enwikivoyage'
21:04 awight: updated all tools from 47407c16d9922b17af70146416913abfe50b728d to 93fda5da99674eca221e0abf53ad499583b27cfb
20:16 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'I5da4aa5db7b5d3c1843a6fd68d0a7c62a2bbfb4e'
19:56 mwalker: updated fundraising tools repo for screenshots, worldpay auditing, live analysis, and... stomp! from 0eb485c8b6db5f06805976860bce7aa8b0d6444b to 47407c16d9922b17af70146416913abfe50b728d
19:09 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.24wmf5
18:55 ori: deploying twemproxy module on mw106*, they may complain for a moment
18:52 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikipedias to 1.24wmf4
18:46 logmsgbot: reedy Finished scap: testwiki to 1.24wmf5 and build l10n cache (duration: 27m 47s)
18:32 mutante: mw1053 was already disabled in pybal though and RT 7408,7435
18:31 mutante: mw1053 sits at disk partitioning dialog (via mgmt)
18:29 Reedy: mw1053 is pingable but not ssh-able
18:18 logmsgbot: reedy Started scap: testwiki to 1.24wmf5 and build l10n cache
17:53 Jeff_Green: adjusted exim conf on mchenry to route donate.wm.o mail to barium instead of aluminium
16:43 mwalker: disabled qc and put site_offline and maintenance_mode on civicrm to true
15:20 logmsgbot: anomie synchronized php-1.24wmf4/extensions/MultimediaViewer 'SWAT: Deploy change 133446 to fix bug 65225 in MultimediaViewer'
14:03 springle: xtrabackup clone db1056 to db1070
13:59 logmsgbot: springle synchronized wmf-config/db-eqiad.php 'depool db1056 while cloning'
13:44 cmjohnson1: sodium going down again for a different disk replacement
13:16 cmjohnson1: shutting down sodium to replace sdb
12:56 godog: restarting gerrit on ytterbium, clones over https seemingly stuck
12:24 manybubbles|away: "in place" reindexing group1 wikis after the deployment train updated cirrus yesterday. They'll need a full reindex after that is done which will take some time but is required to fix issues with redirects not showing up off of the main namespace
11:56 godog: installed openjdk-7-jdk on ytterbium to attempt gerrit thread dump
10:15 logmsgbot: springle synchronized wmf-config/db-eqiad.php 'depool db1009 for raid tests'
06:44 logmsgbot: springle synchronized wmf-config/db-eqiad.php 'move s5 api traffic to db1005'
05:19 logmsgbot: springle synchronized wmf-config/db-eqiad.php 'move s4 commonswiki api traffic to db1042'
04:20 springle: installed db1073
03:15 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu May 15 03:14:04 UTC 2014 (duration 14m 3s)
02:27 logmsgbot: LocalisationUpdate completed (1.24wmf4) at 2014-05-15 02:26:09+00:00
02:15 logmsgbot: LocalisationUpdate completed (1.24wmf3) at 2014-05-15 02:14:31+00:00

May 14

23:42 logmsgbot: mwalker synchronized wmf-config/InitialiseSettings.php 'Poking settings to try and apply them'
23:29 logmsgbot: mwalker synchronized visualeditor.dblist 'Another part of 132409 (visual editor)'
23:27 K4-713: updated payments from 78cc4285bdeb6eecba3efc75e4a04c8b886561e4 to 5e24b953dcff5305099e152139e6e93daba8aeec
23:27 logmsgbot: mwalker synchronized wmf-config/ 'SWAT of 132409 (visual editor) and 130274 (abuse filter)'
22:04 logmsgbot: maxsem synchronized php-1.24wmf3/extensions/MobileFrontend/ 'bug 65042'
22:03 marktraceur: cscott deployed a jenkins job change that pushes parsoid git files to beta-labs for version purposes
22:03 logmsgbot: maxsem synchronized php-1.24wmf4/extensions/MobileFrontend/ 'bug 65042'
20:38 awight: updated crm from 3fd3b94834f94529841ad4a695ecd73c98e487bc to 7a23465e620211739421cce3ad57c62597eb8cc3
20:32 bd808: Restarting logstash on logstash1001.eqiad.wmnet due to missing messages from some (all?) logs
19:58 logmsgbot: demon synchronized wmf-config/InitialiseSettings.php 'No more LQT on wikimania2011wiki'
18:32 Krinkle: integration-slave1001 had its 8GB / /dev/vda1 100% full. Purging /tmp/perf-*.map brought it back to 41%
18:25 Krinkle: integration-slave1001 is having issues writing to disk
17:50 logmsgbot: yurik synchronized php-1.24wmf4/extensions/ZeroRatedMobileAccess/
17:47 logmsgbot: yurik synchronized php-1.24wmf3/extensions/ZeroRatedMobileAccess/
17:30 logmsgbot: yurik synchronized wmf-config/CommonSettings.php
15:44 chasemp: disabling puppet on tungsten to try tweaking carbon settings to affect queue drops (for the better)
14:28 cmjohnson1: mw1053 going down for disk replacement
13:27 bblack: restarting pybals on lvs300x
12:30 _joe_: restarted uwsgi on tungsten
09:46 mark: Started PyBal on lvs300* and established BGP sessions with the routers
09:43 mark: Setup BGP configuration for lvs300* on cr1-esams and cr2-knams, with elevated MEDs to keep them as last resorts
04:18 Tim: deploying apache configuration change with fixes
03:11 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed May 14 03:10:36 UTC 2014 (duration 10m 35s)
03:01 Tim: reverting apache change
02:53 Tim: deploying apache configuration change https://gerrit.wikimedia.org/r/106109
02:26 logmsgbot: LocalisationUpdate completed (1.24wmf4) at 2014-05-14 02:25:08+00:00
02:21 springle: upgrade db1043, rebuild as m3 master
02:14 logmsgbot: LocalisationUpdate completed (1.24wmf3) at 2014-05-14 02:13:23+00:00
00:18 manybubbles: bouncing elasticsearch on elastic1015 to pick up gc logging configuration. it might warn but shouldn't cause any service disrubtion.

May 13

23:27 logmsgbot: maxsem synchronized php-1.24wmf4/includes/exception/MWException.php 'https://gerrit.wikimedia.org/r/#/c/133184/'
23:25 logmsgbot: maxsem synchronized php-1.24wmf3/includes/exception/MWException.php 'https://gerrit.wikimedia.org/r/#/c/133183/'
23:21 MaxSem: Ran namespaceDupes after adding new namespaces to zhwikisource - no problems found
23:18 logmsgbot: maxsem synchronized wmf-config/InitialiseSettings.php 'https://gerrit.wikimedia.org/r/127584'
22:43 chasemp: ms-be1009 rebooted as it had locked up, swift seems to have recoverd
22:35 mutante: created new gerrit projects for phabricator,arcanist and libphutil
22:22 ori: restarting tungsten to verify fix for gdash/graphite initialization
21:33 ori: gdash and graphite currently down; chase & ori debugging
21:29 manybubbles: I caused elasticsearch1015 to drop out of the Elasticsearch cluster by tring to take a heap dump on it. don't do that. It stops the application for many seconds.
19:07 logmsgbot: yurik synchronized php-1.24wmf3/extensions/ZeroRatedMobileAccess/
19:02 logmsgbot: yurik synchronized php-1.24wmf4/extensions/ZeroRatedMobileAccess/
18:49 cmjohnson1: replacing failed disk dataset1001
18:25 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'touch for I1681addaed690b652822c0296b7a3e9b84de93b6'
18:22 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non wikipedias to 1.24wmf4
15:52 logmsgbot: anomie synchronized php-1.24wmf4/resources/Resources.php 'SWAT: Deploy jQuery Migrate to 1.24wmf4'
15:51 logmsgbot: anomie synchronized php-1.24wmf4/resources/lib/jquery/jquery.migrate.js 'SWAT: Deploy jQuery Migrate to 1.24wmf4'
04:09 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue May 13 04:07:53 UTC 2014 (duration 7m 52s)
03:01 logmsgbot: LocalisationUpdate completed (1.24wmf4) at 2014-05-13 03:00:07+00:00
02:31 logmsgbot: LocalisationUpdate completed (1.24wmf3) at 2014-05-13 02:30:33+00:00

May 12

23:28 logmsgbot: yurik synchronized php-1.24wmf3/extensions/ZeroRatedMobileAccess/
23:24 logmsgbot: yurik synchronized php-1.24wmf4/extensions/ZeroRatedMobileAccess/
20:11 gwicke: deployed Parsoid d1c778ea3
17:00 mutante: re-enabled mw1186 in pybal
16:58 logmsgbot: manybubbles Finished scap: scapping again to get ms1186 synced up (duration: 00m 56s)
16:57 logmsgbot: manybubbles Started scap: scapping again to get ms1186 synced up
16:49 mutante: disabled mw1186 in pybal
16:46 mutante: mw1186 - down, powercycling
16:39 logmsgbot: manybubbles Finished scap: fix php symlink (duration: 04m 25s)
16:35 logmsgbot: manybubbles Started scap: fix php symlink
16:09 logmsgbot: manybubbles synchronized wmf-config/InitialiseSettings.php
16:07 logmsgbot: manybubbles Finished scap: update visual editor for swat deploy (duration: 28m 58s)
15:38 logmsgbot: manybubbles Started scap: update visual editor for swat deploy
11:01 springle: killed bunch of slow Flow\Formatter\ContributionsQuery::queryRevisions queries on flowdb
03:14 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon May 12 03:13:40 UTC 2014 (duration 13m 39s)
02:26 logmsgbot: LocalisationUpdate completed (1.24wmf4) at 2014-05-12 02:25:54+00:00
02:15 logmsgbot: LocalisationUpdate completed (1.24wmf3) at 2014-05-12 02:14:23+00:00

May 11

15:26 logmsgbot: reedy synchronized php-1.24wmf3/thumb.php
15:23 logmsgbot: reedy synchronized php-1.24wmf4/thumb.php
14:21 cmjohnson1: power cycling asw-d5-eqiad
03:11 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun May 11 03:10:37 UTC 2014 (duration 10m 36s)
02:23 logmsgbot: LocalisationUpdate completed (1.24wmf4) at 2014-05-11 02:22:02+00:00
02:13 logmsgbot: LocalisationUpdate completed (1.24wmf3) at 2014-05-11 02:12:49+00:00

May 10

19:58 logmsgbot: hoo synchronized php-1.24wmf3/extensions/Wikidata/ 'Resyncing Wikidata for mw1122'
18:31 hoo: approved an oauth request by Aaron Halfaker by making myself oauth admin for a moment
17:36 logmsgbot: hoo synchronized php-1.24wmf4/extensions/Wikidata/ 'Update Wikidata to fix the JSON dump generation'
17:35 logmsgbot: hoo synchronized php-1.24wmf3/extensions/Wikidata/ 'Update Wikidata to fix the JSON dump generation'
17:11 marktraceur: pushed new uploadwizard qunit job to Jenkins
16:30 logmsgbot: reedy updated /a/common to I415e67919: Memory limit to 235M
15:42 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'I415e679197b97e2babe50544cf1e8c26c13a598a'
13:59 logmsgbot: bsitu synchronized php-1.24wmf4/extensions/Flow 'Update Flow'
13:46 Krinkle: Reloading Zuul to deploy I403760f1f6dd1bc2
12:07 logmsgbot: hoo synchronized php-1.24wmf4/extensions/Wikidata/ 'Update Wikibase to fix performance issues with dumpJson'
12:05 logmsgbot: hoo synchronized php-1.24wmf3/extensions/Wikidata/ 'Update Wikibase to fix performance issues with dumpJson (2nd run)'
12:03 logmsgbot: hoo synchronized php-1.24wmf3/extensions/Wikidata/ 'Update Wikibase to fix performance issues with dumpJson'
08:53 paravoid: rack D5 down, switch unresponsive; minimal impact (mw1201-1203, 1208-1210)
06:56 logmsgbot: aaron synchronized php-1.24wmf3/img_auth.php '264967c58eccb6dae872ab7345d08f8381ac43a7'
06:46 logmsgbot: aaron synchronized php-1.24wmf4/img_auth.php 'b08af402ef2de7b2c79f71d848c2b8ae98b47be0'
03:13 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat May 10 03:12:27 UTC 2014 (duration 12m 26s)
02:26 logmsgbot: LocalisationUpdate completed (1.24wmf4) at 2014-05-10 02:25:31+00:00
02:16 logmsgbot: LocalisationUpdate completed (1.24wmf3) at 2014-05-10 02:15:38+00:00
00:02 awight: updated crm from 237d463ed3e275c217a4f497ed30d2f7f20100eb to 3fd3b94834f94529841ad4a695ecd73c98e487bc

May 9

23:09 K4-713: updated payments cluster from 3be44f5d14c00a893a985f3aad86b6b59507a987 to 78cc4285bdeb6eecba3efc75e4a04c8b886561e4
19:24 logmsgbot: reedy synchronized wmf-config/ 'I5265c408443212536a5ed96d910caba50c22e767'
18:56 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Enable EducationProgram on ukwiki'
18:52 logmsgbot: reedy updated /a/common to I0d1ea1639: Remove 1.23wmf13 through 1.23wmf20
18:48 logmsgbot: reedy synchronized docroot and w
18:44 logmsgbot: reedy updated /a/common to I25d891030: Do not optimize commons for new highlighter
18:41 Reedy: Created EducationProgram tables on ukwiki
15:45 manybubbles: reindexing commons to unbreak file searches on wikis not using the experimental highlighter
15:43 logmsgbot: demon synchronized wmf-config/CirrusSearch-common.php 'Do not optimize commons for new highlighter on commons'
15:03 manybubbles: rebuilding hewiki's cirrus index so it can pick up hebmorph too
14:52 logmsgbot: spage synchronized php-1.24wmf4/extensions/Flow 'Fix Flow add new topics and reply in 1.24wmf4'
13:40 logmsgbot: reedy updated /a/common to I721c36406: Add en-rtl to wgExtraLanguageNames for beta
13:24 logmsgbot: reedy updated /a/common to I75a80a998: beta: create a RTL english wiki
10:43 logmsgbot: springle synchronized wmf-config/db-eqiad.php 'raise db1067 and db1066 to normal load. depool db1043'
09:32 logmsgbot: hoo synchronized php-1.24wmf3/extensions/Wikidata/ 'Fix Job injection error handling'
09:24 logmsgbot: hoo synchronized php-1.24wmf3/extensions/Wikidata/ 'Fix Job injection error handling'
08:36 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'I4657fe64572fb3db22e3b48a87df7112b2248e35'
08:21 hashar: apt-get upgraded apache on gallium and lanthanum
08:18 hashar: Jenkins: un pooled integration-slave1001 and rebooting the instance.
08:16 hashar: Jenkins: un pooled integration-slave1002 and rebooting the instance.
08:16 hashar: restarting Zuul (seems some jobs are not properly registered)
06:55 logmsgbot: springle synchronized wmf-config/db-eqiad.php 'pool db1066 in s2, db1067 in s1, warm up'
05:00 springle: installed db1067, db1070, db1071
04:09 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri May 9 04:08:07 UTC 2014 (duration 8m 6s)
03:47 springle: xtrabackup clone db1036 to db1067, db1051 to db1066
03:46 logmsgbot: springle synchronized wmf-config/db-eqiad.php 'reduce db1036 and db1051 load while cloning'
03:24 springle: installed db106[678]
03:09 logmsgbot: LocalisationUpdate completed (1.24wmf4) at 2014-05-09 03:08:55+00:00
02:41 logmsgbot: LocalisationUpdate completed (1.24wmf3) at 2014-05-09 02:40:06+00:00
00:03 logmsgbot: maxsem synchronized php-1.24wmf3/extensions/MobileFrontend/ 'bug 65042'
00:01 logmsgbot: maxsem synchronized php-1.24wmf4/extensions/MobileFrontend/ 'bug 65042'

May 8

23:11 logmsgbot: maxsem synchronized php-1.24wmf3/extensions/MobileFrontend 'https://gerrit.wikimedia.org/r/132299'
23:08 logmsgbot: demon synchronized wmf-config/CirrusSearch-common.php 'Replica count for commonswiki_file -- syncing with whats already live'
23:07 logmsgbot: maxsem synchronized php-1.24wmf4/extensions/MobileFrontend 'https://gerrit.wikimedia.org/r/132299'
21:30 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'wmgVectorBetaPersonalBar to true for all wikis'
21:28 logmsgbot: reedy updated /a/common to I44f67444c: group0 to 1.24wmf4
20:37 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikipedias to 1.24wmf3 and group0 to 1.24wmf4
20:32 logmsgbot: reedy updated /a/common to I11e5ca294: FUTURE: Fifth batch of pilot sites for Media Viewer
20:05 logmsgbot: demon synchronized wmf-config/InitialiseSettings.php 'touch'
20:04 logmsgbot: demon synchronized mediaviewer.dblist 'mediaviewer for svwiki, eswiki, jawiki, ptwiki'
18:56 DJ-K4: enabled fredge queue consumption + jenkins job
17:10 logmsgbot: reedy Finished scap: Build l10n cache for 1.24wmf4 and move testwiki (duration: 17m 05s)
16:53 logmsgbot: reedy Started scap: Build l10n cache for 1.24wmf4 and move testwiki
16:52 chasemp: rebooted ms-be1006 since it dropped dead
16:52 logmsgbot: reedy Finished scap: Build l10n cache for 1.24wmf4 and move testwiki (duration: 11m 42s)
16:42 manybubbles: reindexing the hebrew wikis other then hewikipedia now that they are on wmf3 so they can have hebmorph
16:40 logmsgbot: reedy Started scap: Build l10n cache for 1.24wmf4 and move testwiki
16:39 manybubbles: rebuilding enwiki's cirrus index to optimize for new highlighter
16:14 logmsgbot: demon rebuilt wikiversions.cdb and synchronized wikiversions files: group1 wikis to wmf3
16:13 logmsgbot: springle synchronized wmf-config/db-eqiad.php 'raise db106[45] to normal load'
15:13 logmsgbot: manybubbles synchronized php-1.24wmf3/extensions/CirrusSearch/ 'updating Cirrus to pick up some fixes'
15:08 logmsgbot: manybubbles synchronized wmf-config/InitialiseSettings.php 'engage new hightlighter on some more wikis'
15:00 manybubbles: rebuilding all hebrew wikis _except_ hebrew wikipedia and hebrew wikisource to pick up hebmorph. hewikisource got it this morning. hewiki will get it this afternoon after the deployment train
13:24 logmsgbot: springle synchronized wmf-config/db-eqiad.php 'warm up db1064 in s4, db1065 in s1'
12:59 manybubbles: rebuilding cirrus index for hewikisource to pick up hebmorph
10:18 logmsgbot: springle synchronized wmf-config/db-eqiad.php 'reduce db1049 and db1051 load while cloning'
10:16 springle: xtrabackup clone db1051 to db1065
09:35 springle: xtrabackup clone db1049 to db1064
08:55 springle: installed db106[45]
08:35 logmsgbot: reedy synchronized docroot and w
08:18 logmsgbot: reedy synchronized php-1.24wmf4 'staging'
08:01 logmsgbot: reedy updated /a/common to I7f2d2b25d: Allow all users on OfficeWiki to send mass messages
03:44 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu May 8 03:43:39 UTC 2014 (duration 43m 38s)
02:57 logmsgbot: LocalisationUpdate completed (1.24wmf3) at 2014-05-08 02:56:02+00:00
02:29 logmsgbot: LocalisationUpdate completed (1.24wmf2) at 2014-05-08 02:28:17+00:00

May 7

23:36 logmsgbot: demon Finished scap: no-op scap, for ori (duration: 09m 01s)
23:27 logmsgbot: demon Started scap: no-op scap, for ori
23:17 logmsgbot: ori Finished scap: No changes; testing scap to osmium (again) (duration: 03m 25s)
23:14 logmsgbot: ori Started scap: No changes; testing scap to osmium (again)
23:14 logmsgbot: ori scap aborted: No changes; testing scap to osmium (duration: 11m 27s)
23:06 awight: update crm from 1740219e38091ba4f7afe6545ea189b27340bf86 to 237d463ed3e275c217a4f497ed30d2f7f20100eb
23:02 logmsgbot: ori Started scap: No changes; testing scap to osmium
21:37 K4-713: updated payments from a7fa0d64da2c56586c83cf92babb65bac857be2e to 3be44f5d14c00a893a985f3aad86b6b59507a987
21:00 K4-713: synchronized payments from 4811f6d3d80d126c to a7fa0d64da2c56586
20:49 K4-713: revlocked payments to 4811f6d3d80d126c due to strange errors during an attempted deploy of a7fa0d64da2c56586
20:17 awight: drush pm-uninstall wmf_fredge_qc
20:16 awight: bad call: drush en wmf_fredge_qc -- need to rollback the module schema version and try again with "fredge" creds
20:14 awight: crm updated from cfe34fe0b10861167199a8f72bba279b9cac5e6e to 1740219e38091ba4f7afe6545ea189b27340bf86
20:07 subbu: deployed parsoid 71f4e884 (with deploy sha 9a62899d)
17:19 logmsgbot: yurik synchronized php-1.24wmf2/extensions/ZeroRatedMobileAccess/
17:16 logmsgbot: yurik synchronized php-1.24wmf3/extensions/ZeroRatedMobileAccess/
16:33 manybubbles: performing a rolling restart on elasticsearch nodes in production to pick up new plugins: experimental-highlight 0.0.8 and analysis-hebrew 1.1.0
16:30 _joe_: restarted mwprof/profiler-to-carbon on tungsten, stuck somehow
15:40 logmsgbot: demon synchronized wmf-config/CirrusSearch-common.php 'Raised redundancy for commonswiki_file back up, config to match'
15:27 logmsgbot: anomie synchronized wmf-config/InitialiseSettings.php 'SWAT: Remove obsolete $wmgCirrusIsBuilding (no functionality change)'
15:26 logmsgbot: anomie synchronized wmf-config/CirrusSearch-common.php 'SWAT: Remove obsolete $wmgCirrusIsBuilding (no functionality change)'
15:19 anomie: anomie namespaceDupes.php on OfficeWiki done (that was quick)
15:18 anomie: anomie Running maintenance/namespaceDupes.php on OfficeWiki
15:16 logmsgbot: anomie synchronized wmf-config/InitialiseSettings.php 'SWAT: Change wgMetaNamespace for OfficeWiki and add alias'
15:12 logmsgbot: anomie synchronized wmf-config/InitialiseSettings.php 'SWAT: Allow all users on OfficeWiki to send mass messages (for real this time)'
15:09 logmsgbot: anomie synchronized wmf-config/InitialiseSettings.php 'SWAT: Allow all users on OfficeWiki to send mass messages'
15:03 logmsgbot: anomie synchronized wmf-config/InitialiseSettings.php 'SWAT: Set $wgUploadMissingFileUrl for enwiki'
10:21 logmsgbot: springle synchronized wmf-config/db-eqiad.php 'db1049 to full steam'
09:40 logmsgbot: springle synchronized wmf-config/db-eqiad.php 'warm up db1049 in s4'
07:54 hashar: Jenkins: installing Claim plugin (allow folks to comment on builds and mark them)
06:48 springle: again
05:59 springle: powercycled unresponsive neon, swapdeath + oom killer
03:12 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed May 7 03:11:36 UTC 2014 (duration 11m 35s)
02:28 logmsgbot: LocalisationUpdate completed (1.24wmf3) at 2014-05-07 02:27:23+00:00
02:15 logmsgbot: LocalisationUpdate completed (1.24wmf2) at 2014-05-07 02:14:44+00:00
00:02 mutante: upgrading libtiff on imagescalers

May 6

23:18 logmsgbot: maxsem synchronized wmf-config/InitialiseSettings.php 'https://gerrit.wikimedia.org/r/131855'
21:57 mutante: gracefull'ing apaches
20:43 logmsgbot: demon synchronized php-1.24wmf3/extensions/CirrusSearch 'Rolling Cirrus back to known-good state'
20:42 logmsgbot: demon synchronized php-1.24wmf2/extensions/CirrusSearch 'Rolling Cirrus back to known-good state'
20:27 logmsgbot: demon synchronized php-1.24wmf3/extensions/CirrusSearch/includes/Hooks.php 'Fix typehinting'
20:26 logmsgbot: demon synchronized php-1.24wmf2/extensions/CirrusSearch/includes/Hooks.php 'Fix typehinting'
20:19 logmsgbot: demon synchronized php-1.24wmf3/extensions/CirrusSearch/CirrusSearch.php 'I2638b695: fix for page moves'
20:19 logmsgbot: demon synchronized php-1.24wmf2/extensions/CirrusSearch/CirrusSearch.php 'I2638b695: fix for page moves'
20:18 logmsgbot: demon synchronized php-1.24wmf3/extensions/CirrusSearch/includes/Hooks.php 'I2638b695: fix for page moves'
20:17 logmsgbot: demon synchronized php-1.24wmf2/extensions/CirrusSearch/includes/Hooks.php 'I2638b695: fix for page moves'
19:00 mutante: enabling puppet on netmon1001
17:38 hoo: Changed email for global account "ElphiBot".
16:48 logmsgbot: mattflaschen synchronized php-1.24wmf3/extensions/GettingStarted/ 'GettingStarted token and logging deployment'
16:45 logmsgbot: mattflaschen synchronized php-1.24wmf2/extensions/GettingStarted/ 'GettingStarted token and logging deployment'
16:19 hoo: Changed email for global account "Elph".
15:42 akosiaris: killed zuul processes on gallium and restarted the service
09:16 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue May 6 09:15:20 UTC 2014
09:08 logmsgbot: LocalisationUpdate completed (1.22wmf15) at Tue May 6 09:07:45 UTC 2014
09:01 logmsgbot: faidon synchronized wmf-config/squid.php 'add Swift to squid.php'
08:59 logmsgbot: faidon updated /a/common to Ica9086dcd: Add Swift frontends to squid.php
08:45 ottomata: re-enabling puppet agent on analytics1022; kafka broker is caught up there and is fully in all ISRs
08:36 ottomata: temporarily disabling puppet on analytics1026 to troubleshoot a camus import problem
06:55 springle: hammering dbstore1001 with dumps in screen session. ignore replag
06:31 ori: ..on vanadium
06:31 ori: deleting rotated logs in /var/log/eventlogging/archive that are older than 90 days
04:47 springle: mydumper/myloader clone db1042 to db1049
04:06 logmsgbot: springle synchronized wmf-config/db-eqiad.php 'depool db1049 for maintenance'
03:48 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue May 6 03:47:41 UTC 2014 (duration 47m 40s)
03:03 logmsgbot: LocalisationUpdate completed (1.24wmf3) at 2014-05-06 03:02:03+00:00
02:28 logmsgbot: LocalisationUpdate completed (1.24wmf2) at 2014-05-06 02:27:21+00:00
00:01 logmsgbot: ori synchronized wmf-config/CommonSettings.php 'Update wgFlowCacheVersion to 4.2'

May 5

23:57 logmsgbot: ori updated /a/common to Id1f2e0acf: Drop wgFlowCacheKey from CommonSettings.php
23:54 logmsgbot: ori Finished scap: SWAT deploy for VisualEditor and Flow cherry-picks (duration: 09m 55s)
23:44 logmsgbot: ori Started scap: SWAT deploy for VisualEditor and Flow cherry-picks
23:26 logmsgbot: ori synchronized php-1.24wmf3/extensions/EventLogging 'Update EventLogging for Id23b37fbe for SWAT.'
23:23 logmsgbot: ori synchronized php-1.24wmf2/extensions/EventLogging 'Update EventLogging for Id23b37fbe for SWAT.'
21:07 jgage: trying on analytics1022: https://wikitech.wikimedia.org/wiki/Analytics/Kraken/Kafka/Administration#Recovering_a_laggy_broker_replica
20:58 RobH: ssl1001-1003 now have updated unified cert in service
20:58 jgage: both kafka brokers back in service
20:54 RobH: cp4001-4020 unified cert and nginx service reloaded, back in service
20:50 RobH: ssl1006 and ssl1009 are responsive to nginx and back in service
20:43 RobH: pybal
20:43 RobH: ssl1009 was refusing connections both before and after my ssl cert update. ssl1006 is presently refusing connections post update. they are set to disabled in pubal
20:39 RobH: ssl1008 back into service, ssl1009 already depooled
20:38 jgage: forced kafka broker reelection
20:34 RobH: ssl1007 going back into service, ssl1008 depooling
20:25 RobH: depooled ssl1006/7 for update
20:25 RobH: ssl1004/5 returned to service (and puppet agents enabled)
20:21 RobH: puppet agent has been re-enabled on ssl1001-1003
20:19 RobH: ssl1004/5 disabled for update
20:18 RobH: putting ssl1002/3 back into service
20:15 subbu: deployed parsoid f2f1f1d7 (with deploy sha 71072f8a)
19:58 RobH: ssl1001 back in service, ssl1002-1003 set to disabled in pybal
19:18 RobH: depooling ssl1001 to test new certs live on system
19:09 RobH: disabled puppet on cp40XX, ssl10XX, and ssl30XX
19:08 logmsgbot: bblack synchronized wmf-config/squid.php 'REVERT: Update wgSquidServersNoPurge to use whole subnets for XFF checking'
19:07 logmsgbot: bblack updated /a/common to Iaf4d57d54: Revert "Use whole subnets in squid.php list for XFF acceptance"
19:03 logmsgbot: bblack synchronized wmf-config/squid.php 'Update wgSquidServersNoPurge to use whole subnets for XFF checking'
19:01 logmsgbot: bblack updated /a/common to I5a2d86ef0: Use whole subnets in squid.php list for XFF acceptance
17:05 logmsgbot: aaron synchronized wmf-config/CommonSettings.php 'Revert "Increased htmlCacheUpdate throttle"'
16:00 logmsgbot: anomie synchronized php-1.24wmf3/extensions/MobileFrontend/ 'SWAT: Backport change 131237 to 1.24wmf3 to fix bug in MobileFrontend'
15:59 logmsgbot: anomie synchronized php-1.24wmf2/extensions/MobileFrontend/ 'SWAT: Backport change 131237 to 1.24wmf2 to fix bug in MobileFrontend'
15:49 logmsgbot: anomie synchronized php-1.24wmf2/includes/specials/SpecialAllmessages.php 'SWAT: Backport change 131041 to 1.24wmf2 to fix bug in Special:AllMessages'
15:37 logmsgbot: anomie synchronized php-1.24wmf2/includes/specials/SpecialAllmessages.php 'SWAT: Backport change 131041 to 1.24wmf2 to fix bug in Special:AllMessages'
15:24 logmsgbot: anomie synchronized php-1.24wmf3/includes/specials/SpecialAllmessages.php 'SWAT: Backport change 131041 to 1.24wmf3 to fix bug in Special:AllMessages'
15:12 logmsgbot: anomie synchronized php-1.24wmf3/includes/api/ApiLogin.php 'SWAT: Backport change 131056 to 1.24wmf3 to fix bug 64727'
15:10 logmsgbot: anomie synchronized php-1.24wmf2/includes/api/ApiLogin.php 'SWAT: Backport change 131056 to 1.24wmf2 to fix bug 64727'
12:45 akosiaris: removing various sdtpa devices from LibreNMS
03:12 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon May 5 03:11:15 UTC 2014 (duration 11m 14s)
02:32 ^demon|away: [gitb]lit's wonkiness but they're certainly not helping matters.
02:32 ^demon|away: antimony: ran very very aggressive repacking on mediawiki/core, operations/puppet, mediawiki/extensions/{UploadWizard,CentralAuth,CentralNotice,DonationInterface,FlaggedRevs,AbuseFilter,BlueSpiceExtensions,Translate,WikimediaMessages,EducationProgram,UniversalLanguageSelector,Wikibase}, pywikibot/{core,compat}, operations/dumps/tests. Basically anything taking up >90MB on disk. Probably not the cause of gitb
02:26 logmsgbot: LocalisationUpdate completed (1.24wmf3) at 2014-05-05 02:25:34+00:00
02:14 logmsgbot: LocalisationUpdate completed (1.24wmf2) at 2014-05-05 02:13:18+00:00

May 4

20:57 logmsgbot: aaron synchronized php-1.24wmf3/thumb.php 'c5ebd2aefce9e3fc5b994053078754021176f411'
20:40 logmsgbot: aaron synchronized php-1.24wmf3/thumb.php '6c230cbbc6ffa4d8909e88961ebf75755cf9c9d9'
19:24 logmsgbot: ori updated /a/common to I2916ef3bd: labs: stream recent changes to redis
09:58 _joe_: restarted gitblit, stuck on GC as usual.
08:40 _joe_: restarted apache on tungsten as it was stuck communicating with uwsgi
03:09 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun May 4 03:08:41 UTC 2014 (duration 8m 40s)
02:26 logmsgbot: LocalisationUpdate completed (1.24wmf3) at 2014-05-04 02:25:06+00:00
02:14 logmsgbot: LocalisationUpdate completed (1.24wmf2) at 2014-05-04 02:12:57+00:00

May 3

19:41 logmsgbot: hoo synchronized wmf-config/ 'Documentation only change'
06:02 ori: disabled puppet on osmium to test hhvm build
03:10 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat May 3 03:09:40 UTC 2014 (duration 9m 39s)
02:27 logmsgbot: LocalisationUpdate completed (1.24wmf3) at 2014-05-03 02:26:41+00:00
02:15 logmsgbot: LocalisationUpdate completed (1.24wmf2) at 2014-05-03 02:14:43+00:00

May 2

23:13 logmsgbot: aaron synchronized wmf-config/filebackend.php 'Made private wikis use thumb_handler.php for thumbnails'
22:30 logmsgbot: aaron synchronized wmf-config/filebackend.php 'Removed useless "handlerUrl" config'
21:21 hashar: Jenkins is back :-]
21:14 hashar: restarting Jenkins (making sure the java process properly disappear)
21:14 hashar: zuul jenkins stuck again :(
20:31 logmsgbot: krinkle synchronized php-1.24wmf3/resources/Resources.php 'Ia12998fb11c686'
20:29 logmsgbot: krinkle synchronized php-1.24wmf2/resources/Resources.php 'Ia12998fb11c686'
17:47 RoanKattouw: Restarting stuck jenkins
16:04 _joe_: depooled mw1053 for hardware problems
15:13 andrewbogott: resetting a bunch more UIDs. Running find-and-chown again, but this time not on the swifts: salt -E '^(?!ms-be|labstore|snapshot).*$'
13:46 paravoid: swift @ eqiad: setting zone 5 (ms-be1013/1014/1015) to weight 2000, i.e. 66%
12:06 hashar: updated our Jenkins Job Builder copy abbf318..8df6bab
06:17 ori: re-enabled puppet on osmium and hafnium
04:02 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri May 2 04:01:04 UTC 2014 (duration 1m 3s)
03:10 logmsgbot: LocalisationUpdate completed (1.24wmf3) at 2014-05-02 03:09:16+00:00
02:40 logmsgbot: LocalisationUpdate completed (1.24wmf2) at 2014-05-02 02:39:39+00:00

May 1

23:43 bd808: Restarted logstash on logstash1001; MaxSem noticed that many recursion-guard logs were not being completely reassembled and JVM had one CPU maxed out.
23:08 logmsgbot: maxsem synchronized php-1.24wmf2/extensions/CommonsMetadata/ 'https://gerrit.wikimedia.org/r/#/c/130971/'
22:58 paravoid: disabling puppet on holmium; manually overriding completely broken varnish config
20:10 bd808: Deployed scap 92ea0e9 via trebuchet (not actively used yet)
19:17 logmsgbot: reedy synchronized wmf-config/
18:55 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.24wmf3
18:42 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikipedias to 1.24wmf2
18:07 manybubbles: upgrading highlighter plugin on elasticsearch machines - the cluster will go yellow for a few hours during the rolling restart
16:43 logmsgbot: reedy Finished scap: testwiki to 1.24wmf3 (duration: 29m 54s)
16:18 subbu: deployed parsoid 5e05c585 (with deploy sha ca2db96d)
16:16 ottomata: reinstalling elastic1008
16:13 logmsgbot: reedy Started scap: testwiki to 1.24wmf3
16:10 logmsgbot: reedy updated /a/common to I832b45db6: Correct a domain in wgCopyUploadsDomains
16:01 logmsgbot: manybubbles synchronized wmf-config/InitialiseSettings.php 'Enable cirrus as a betafeature on all wikis which did not already have it.'
15:51 logmsgbot: manybubbles synchronized wmf-config/InitialiseSettings.php 'SWAT fix GWtoolset url and add some more logos'
15:40 logmsgbot: manybubbles synchronized wmf-config/InitialiseSettings.php 'SWAT fix GWtoolset url and add some more logos'
15:35 andrewbogott: reassigning a ton of UIDs in production; running a couple dozen 'find' commands to chown files
15:34 logmsgbot: manybubbles synchronized php-1.24wmf2/includes/Article.php 'SWAT update to prevent fatal in backwards compatibility method'
15:27 logmsgbot: manybubbles synchronized php-1.24wmf2/extensions/VisualEditor/ 'SWAT update for firefox focus'
15:08 logmsgbot: manybubbles synchronized php-1.24wmf2/extensions/Wikidata/ 'SWAT update for time parsing and formatting'
08:15 springle: switching s1-analytics-slave db1047 enwiki to tokudb
03:18 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu May 1 03:17:39 UTC 2014 (duration 17m 38s)
02:34 logmsgbot: LocalisationUpdate completed (1.24wmf2) at 2014-05-01 02:33:34+00:00
02:22 logmsgbot: LocalisationUpdate completed (1.24wmf1) at 2014-05-01 02:21:44+00:00

April 30

23:58 bblack: mobile caches now sync zero carriers/proxies from zero.wm.org rather than noc(fenari) temp hack solution
23:14 logmsgbot: ori synchronized php-1.24wmf2/extensions/VisualEditor 'Ibaf0cc823bfe: Update VisualEditor for cherry-picks'
20:34 logmsgbot: yurik synchronized php-1.24wmf2/extensions/ZeroRatedMobileAccess/
20:31 logmsgbot: yurik synchronized php-1.24wmf1/extensions/ZeroRatedMobileAccess/
18:08 logmsgbot: yurik synchronized wmf-config/mobile.php
17:37 yurikR: yurik Added $wmgZeroRatedMobileAccessApiUserName / password
17:36 logmsgbot: yurik synchronized wmf-config/PrivateSettings.php
17:17 logmsgbot: yurik synchronized php-1.24wmf2/extensions/ZeroRatedMobileAccess/
17:11 logmsgbot: yurik synchronized php-1.24wmf1/extensions/ZeroRatedMobileAccess/
17:10 logmsgbot: aaron synchronized php-1.24wmf2/thumb.php '93a33d733fa81a9a5396083ded6aa28a74f08a98'
17:07 logmsgbot: aaron synchronized wmf-config/PoolCounterSettings-eqiad.php 'Removed "TMHTransformFrame" pool counter entry'
16:19 Krinkle: restarting stuck Jenkins
16:16 Krinkle: Deploying Id248bd6706f32a on Zuul and reloading service
09:29 springle: dbstore100[12] replicating m2 eventlogging
06:54 logmsgbot: aaron synchronized wmf-config/PoolCounterSettings-eqiad.php 'Removed GetLocalFileCopy pool counter entry'
03:23 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Apr 30 03:23:13 UTC 2014 (duration 23m 12s)
02:33 logmsgbot: LocalisationUpdate completed (1.24wmf2) at 2014-04-30 02:33:12+00:00
02:22 logmsgbot: LocalisationUpdate completed (1.24wmf1) at 2014-04-30 02:22:23+00:00
01:58 Krinkle: Deploying Ia82779635d762a3 on Zuul, reloading services
00:02 logmsgbot: ebernhardson synchronized php-1.24wmf1/extensions/MultimediaViewer/ 'I684d44a0b5'

April 29

23:59 logmsgbot: ebernhardson synchronized php-1.24wmf2/extensions/Wikidata/ 'I84c2283e07'
23:52 logmsgbot: ebernhardson synchronized php-1.24wmf2/extensions/MultimediaViewer/ 'I84f8e347f'
23:37 logmsgbot: ori synchronized php-1.24wmf2/skins 'I66c56c577bad'
23:37 logmsgbot: ori synchronized php-1.24wmf2/extensions/VisualEditor 'I5818dce62'
23:25 logmsgbot: ebernhardson synchronized wmf-config/InitialiseSettings.php 'Enable MediaViewer survey on Spanish and Dutch Wikipedia'
23:22 logmsgbot: ori synchronized wmf-config/InitialiseSettings.php 'I59e1fa87e: Include language-0 categories for betawikiversity'
23:21 logmsgbot: ori updated /a/common to I59e1fa87e: Include language-0 categories for betawikiversity
23:16 logmsgbot: ebernhardson synchronized php-1.24wmf2/extensions/WikiEditor 'Update WikiEditor to 1.24wmf2'
21:59 logmsgbot: spage synchronized wmf-config/InitialiseSettings.php 'enable Flow on two mw talk pages for James_F'
21:25 Krinkle: Running deleteEqualMessages.php on newwiki (bug 43917)
21:12 AaronSchulz: populateImageSha1 fixer script finished on all wikis
20:47 manybubbles: rebuilding search indexes for group1 wikis after the train upgraded cirrus for them
20:13 logmsgbot: ori synchronized wmf-config/InitialiseSettings.php 'I56ae921ca: Change group name on the Persian Wikipedia'
20:13 logmsgbot: ori updated /a/common to I56ae921ca: Change group name on the Persian Wikipedia
18:16 logmsgbot: reedy synchronized multiversion/
18:09 logmsgbot: reedy synchronized wmf-config/ 'I52293b29a87e2c645735b37215e4113e561e47da'
18:04 logmsgbot: reedy synchronized docroot and w
18:03 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non wikipedias to 1.24wmf2
17:13 manybubbles: raising number of replicas of enwiki's cirrus index from 1 to 2. cluster will probably complain while they allocate
16:53 RobH: osmium install complete, ticket resolved, ready for ^d and ori to take over
16:29 logmsgbot: reedy updated /a/common to Idb2a86791: Increased htmlCacheUpdate throttle
16:18 logmsgbot: aaron synchronized wmf-config/CommonSettings.php 'Increased htmlCacheUpdate throttle'
16:05 logmsgbot: manybubbles synchronized php-1.24wmf2/extensions/Wikidata/ 'SWAT upgrade wikidata for date parsing fixes'
15:47 manybubbles: rebuilding test2wiki's cirrus index after swat deploy
15:45 logmsgbot: manybubbles synchronized wmf-config/InitialiseSettings.php 'SWAT add autopatrolled group to shwiktionary and draft namespace to chapcomwiki'
15:38 logmsgbot: manybubbles synchronized wmf-config/CirrusSearch-common.php 'SWAT deploy - move group0 wikis to experimental highlighter and give enwiki its redundency back'
15:37 logmsgbot: manybubbles synchronized wmf-config/InitialiseSettings.php 'SWAT deploy - extra setting for cirrus and new groups and sources for gwtoolset'
15:32 manybubbles: cirrus deploys look good, moving on to twkozlowski's requests
15:31 logmsgbot: manybubbles synchronized wmf-config/CirrusSearch-common.php 'SWAT deploy - move group0 wikis to experimental highlighter and give enwiki its redundency back'
15:31 logmsgbot: manybubbles synchronized wmf-config/InitialiseSettings.php 'SWAT deploy - extra setting for cirrus'
15:14 logmsgbot: manybubbles synchronized php-1.24wmf2/extensions/CirrusSearch/ 'SWAT upgrade - improves as yet undeployed highlighter config'
14:40 logmsgbot: faidon synchronized php-1.24wmf2/extensions/GettingStarted/ 'Revert GettingStarted anon tokens'
14:39 logmsgbot: faidon synchronized php-1.24wmf1/extensions/GettingStarted/ 'Revert GettingStarted anon tokens'
13:45 Krinkle: Running deleteEqualMessages.php on lnwiki (bug 43917)
13:41 logmsgbot: bblack synchronized wmf-config/InitialiseSettings.php 'Revert "Unset $wgUseXVO"'
13:10 Krinkle: Running deleteEqualMessages.php on nlwiki (bug 43917)
12:35 Krinkle: Running deleteEqualMessages.php on cswikiversity (bug 43917)
12:08 Krinkle: Running deleteEqualMessages.php on cswiktionary (bug 43917)
11:31 logmsgbot: bblack synchronized wmf-config/InitialiseSettings.php 'Revert "Unset $wgUseXVO"'
10:34 akosiaris: restarted gitblit on antimony
10:20 hashar: restarting Zuul
10:14 hashar: Jenkins / Zuul : upgrading python-gear from 0.4.0-1 to 0.5.4-1 . Should fix a bunch of jobs registrations issues in Zuul Gearman. bug 63760
09:59 akosiaris: update python-gear on apt.wikimedia.org to 0.5.4-1
08:30 akosiaris: Published carbon's IPv6 address in DNS. apt.wikimedia.org and ubuntu.wikimedia.org are now IPv6 enabled
05:25 AaronSchulz: Manually removed a few 10000s of duplicate Cyberbot job duplicates
03:48 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Apr 29 03:48:29 UTC 2014 (duration 48m 28s)
03:02 logmsgbot: LocalisationUpdate completed (1.24wmf2) at 2014-04-29 03:02:55+00:00
02:37 logmsgbot: LocalisationUpdate completed (1.24wmf1) at 2014-04-29 02:37:21+00:00
02:24 bblack: wiped disk cache (via mkfs) on cp1055 to (hopefully) clear crash-restart cycle, backend back in service now
01:53 Krinkle: Running deleteEqualMessages.php on cswiki (bug 43917)
00:49 Tim: on cp1055: backend varnish is continually panicking and restarting its child, will try to stop/start service
00:22 logmsgbot: aaron synchronized php-1.24wmf1/includes/WikiPage.php '3505cf933d874ea44bd5a3f3ffe210598ef7eec2'
00:14 logmsgbot: aaron synchronized php-1.24wmf2/includes/WikiPage.php '119fd9fc17b3c309b9065b54f4c83ede7d20498b'
00:00 logmsgbot: mwalker Finished scap: SWAT for 129813, 129640, 129708, 129707, and 130246 (duration: 11m 37s)

April 28

23:49 logmsgbot: mwalker Started scap: SWAT for 129813, 129640, 129708, 129707, and 130246
23:26 logmsgbot: aaron synchronized php-1.24wmf2/maintenance/runJobs.php '91dddcaffa58430204e2bf3c612d893b2710f33b'
22:43 jgage: rebooting db1047 due to unpingable and unresponsive on mgmt console
22:28 logmsgbot: mflaschen synchronized php-1.24wmf2/extensions/GettingStarted/ 'Revert token/TrackedPageContentSaveComplete GettingStarted change'
22:28 logmsgbot: mflaschen synchronized php-1.24wmf1/extensions/GettingStarted/ 'Revert token/TrackedPageContentSaveComplete GettingStarted change'
20:47 Krinkle: Running deleteEqualMessages.php on sqwiki (bug 43917)
20:38 paravoid: apache-graceful-all after tuning php.ini's expose_php setting
20:12 logmsgbot: reedy synchronized wmf-config/db-labs.php
20:08 apergos: restarted gmetad on nickel
20:02 gwicke: deployed Parsoid cab9348e using deploy 9e9030d
20:00 logmsgbot: mflaschen synchronized wmf-config/InitialiseSettings.php 'Update GettingStarted config for new format'
19:59 logmsgbot: mflaschen synchronized php-1.24wmf2/extensions/GettingStarted/ 'Sync GettingStarted for Growth team deploy'
19:58 logmsgbot: mflaschen synchronized php-1.24wmf1/extensions/GettingStarted/ 'Sync GettingStarted for Growth team deploy'
19:30 Krinkle: Running deleteEqualMessages.php on simplewiki (bug 43917)
19:25 logmsgbot: ori synchronized wmf-config/InitialiseSettings.php 'I5e0709ef0: Unset $wgUseXVO'
19:25 logmsgbot: ori updated /a/common to I5e0709ef0: Unset $wgUseXVO
19:22 Krinkle: Running deleteEqualMessages.php on rowiktionary (bug 43917)
19:01 Krinkle: Running deleteEqualMessages.php on bat-smgwiki (bug 43917)
18:48 Krinkle: Running deleteEqualMessages.php on afwikiquote (bug 43917)
18:41 hashar: Jenkins disconnected lanthanum slave, killed all jenkins-slave process on it and repooled server.
18:39 Krinkle: Running deleteEqualMessages.php on abwiki (bug 43917)
17:54 manybubbles: deploying a new version of our Elasticsearch highlighter by doing a rolling restart on Elasticsearch machines - should cause no interruption of service
16:51 akosiaris: executed graceful-stop, start for apaches in order to load the new php-luasandbox apache module
15:08 logmsgbot: manybubbles synchronized wmf-config/InitialiseSettings.php 'Add new sources to gwtoolset and namespaces to hewikisource'
12:48 _joe_: restarted apache on wikitech-static
12:29 Krinkle: Running deleteEqualMessages.php on cvwiki (bug 43917)
12:29 Krinkle: Running deleteEqualMessages.php on afwiki (bug 43917)
11:46 Krinkle: Running deleteEqualMessages.php on bpywiki (bug 43917)
11:00 springle: completed schema change, bug 64411, page_props.pp_sortkey
08:49 springle: reloading db1046 from fresh m2 dump
03:12 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Apr 28 03:12:11 UTC 2014 (duration 12m 10s)
03:00 springle: starting online schema change, bug 64411, page_props.pp_sortkey
02:30 logmsgbot: LocalisationUpdate completed (1.24wmf2) at 2014-04-28 02:30:04+00:00
02:22 springle: powercycle db1046 unresponsive
02:20 logmsgbot: LocalisationUpdate completed (1.24wmf1) at 2014-04-28 02:20:39+00:00

April 27

14:49 paravoid: stopping pybal on lvs300[1-4] to avoid the logspam
05:31 springle: mariadb sql dump in progress db1048 /a for rebuilding db1046. ok to kill if necessary
03:08 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Apr 27 03:08:01 UTC 2014 (duration 8m 0s)
02:28 logmsgbot: LocalisationUpdate completed (1.24wmf2) at 2014-04-27 02:28:32+00:00
02:19 logmsgbot: LocalisationUpdate completed (1.24wmf1) at 2014-04-27 02:19:43+00:00

April 26

23:27 logmsgbot: aaron synchronized php-1.24wmf2/includes/profiler/Profiler.php '7e20cdd2ba0381b81d3b43c8743fa4202a76bd61'
13:43 logmsgbot: hoo synchronized wmf-config/InitialiseSettings-labs.php 'Syncing for cluster consistency'
13:42 logmsgbot: hoo updated /a/common to Ic98928d54: Have Commons on Beta Labs use $stdlogo
13:26 springle: db1016 xfs head behind tail. reverted to last snapshot volume
12:57 springle: powercycle db1016 unresponsive
03:11 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Apr 26 03:11:16 UTC 2014 (duration 11m 15s)
02:31 logmsgbot: LocalisationUpdate completed (1.24wmf2) at 2014-04-26 02:31:41+00:00
02:22 logmsgbot: LocalisationUpdate completed (1.24wmf1) at 2014-04-26 02:22:41+00:00

April 25

20:29 logmsgbot: mwalker synchronized php-1.24wmf2/extensions/WikiEditor/ 'Reverting some faulty WikiEditor code for bug 64289'
20:28 logmsgbot: mwalker synchronized php-1.24wmf1/extensions/WikiEditor/ 'Reverting some faulty WikiEditor code for bug 64289'
18:02 K4-713: adjusted antifraud filters on payments
17:08 Jeff_Green: reenabled puppet and notifications for iodine
16:22 manybubbles: Elasticsearch rolling restart complete.
14:46 Jeff_Green: disabled icinga notifications for iodine too...
14:44 Jeff_Green: puppet stopped on iodine, doing manual spamassassin training
12:58 springle: upgrading db1047 (analytics slave) to mariadb 10
12:28 manybubbles: Performing rolling restart of Cirrus's Elasticsearch servers to upgrade a plugin. Low risk because it won't be used by the general public until Mondayish so a Friday push should be ok.
12:07 ottomata: stopping puppet on analytics1026 to test more frequent runs of Camus
12:02 logmsgbot: reedy synchronized wmf-config/
11:35 logmsgbot: reedy synchronized docroot and w
09:41 logmsgbot: reedy synchronized wmf-config/
09:15 logmsgbot: reedy synchronized wmf-config/ 'I4a68dc8321b7b302f5e89b5adafcff096f2ac35b'
09:13 logmsgbot: reedy synchronized multiversion/ 'I4a68dc8321b7b302f5e89b5adafcff096f2ac35b'
08:43 logmsgbot: reedy synchronized docroot and w
08:39 logmsgbot: reedy updated /a/common to I57b6d055e: Update flow cache version to 4.2
07:22 springle: up to 5x pt-table-sync running on db1048 m2 master for eventlogging migration. ok to kill if necessary
03:44 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Apr 25 03:44:16 UTC 2014 (duration 44m 15s)
03:03 logmsgbot: LocalisationUpdate completed (1.24wmf2) at 2014-04-25 03:03:33+00:00
02:37 logmsgbot: LocalisationUpdate completed (1.24wmf1) at 2014-04-25 02:37:06+00:00
00:47 mwalker: updating payments servers from e6d188f0dfcd57406acb58aa2b5bf45e48117c33 to a7fa0d64da2c56586c83cf92babb65bac857be2e for worldpay
00:29 bd808|deploy: Ori was able to fix permissions and second scap test worked as expected
00:28 logmsgbot: bd808 Finished scap: no-op scap to validate I24149ab and Ie967901 (try 2) (duration: 05m 02s)
00:26 bd808|deploy: File permissions on /srv/scap/scap/*.{py,pyc} were not consistently a+r which is needed for scap-rebuild-cdbs
00:23 logmsgbot: bd808 Started scap: no-op scap to validate I24149ab and Ie967901 (try 2)

April 24

23:55 logmsgbot: bd808 Finished scap: no-op scap to validate I24149ab and Ie967901 (duration: 02m 51s)
23:52 logmsgbot: bd808 Started scap: no-op scap to validate I24149ab and Ie967901
23:46 awight: perform crm schema update 7018
23:36 bd808|deploy: Running scap-rebuild-cdbs on tin to test python port
23:17 logmsgbot: mwalker synchronized wmf-config/CommonSettings.php 'Updating flow configuration 129589'
23:16 logmsgbot: mwalker synchronized php-1.24wmf2/extensions/Flow 'Updating flow for 129589 and 129604'
22:53 AaronSchulz: Running PopulateImageSha1.php for all multi-versioned files on all wikis to fix broken SHA-1s
21:22 springle: eventlogging dump loading on db1048 m2 master in screen. ok to kill if necessary
21:18 hashar: restarting Zuul
21:01 mwalker: updating payments from 4811f6d3d80d126c8b3c89c11d20cc6416cb58f6 to e6d188f0dfcd57406acb58aa2b5bf45e48117c33 for donationinterface / worldpay updates
20:39 paravoid: shutting down sdtpa, cr1-sdtpa, csw1-sdtpa, msw1-sdtpa and other sdtpa hosts gone forever
20:37 Coren: sync-apache for 126969 and 91339
19:59 logmsgbot: reedy synchronized docroot and w
19:54 logmsgbot: aaron synchronized wmf-config/PoolCounterSettings-eqiad.php 'Removed redundant pool counter config'
19:00 ori: eventlogging data streaming into db1048; db1047 consumer decom'd.
19:00 logmsgbot: reedy synchronized wmf-config/
18:59 logmsgbot: reedy synchronized database lists files:
18:43 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Rest of group0 to 1.24wmf2
18:38 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias to 1.24wmf1
18:33 springle: begin eventlogging migration db1047 to db1048 (m2), RT #7081
16:42 hashar: restarted both Zuul and Jenkins
16:29 logmsgbot: reedy synchronized php-1.24wmf2/extensions/Wikidata 'I43988505ea0fd7ac6b1278a50237e0e1d3ee0e9e'
16:25 hashar: restarting Zuul. Got asked to be stopped.
16:15 hashar: Jenkins ended up being stalled due to a known unfigured out issue :/
16:13 hashar: Killed a leftover jenkins process on gallium
15:50 akosiaris: scheduled a safe restart of jenkins
15:48 logmsgbot: hoo updated /a/common to I53de8d84b: Add two languages not supported by MediaWiki to testwikidata
15:48 logmsgbot: hoo synchronized wmf-config/InitialiseSettings.php 'Add two languages not supported by MediaWiki to testwikidata'
15:47 andrewbogott: zuul on gallium is dead and I don't know why
15:45 andrewbogott: restarted jenkins and zuul on gallium
14:02 logmsgbot: reedy Finished scap: testwiki to 1.24wmf2 build l10n cache (take 2) (duration: 15m 46s)
13:46 logmsgbot: reedy Started scap: testwiki to 1.24wmf2 build l10n cache (take 2)
13:45 logmsgbot: reedy Finished scap: testwiki to 1.24wmf2 build l10n cache (duration: 08m 16s)
13:37 logmsgbot: reedy Started scap: testwiki to 1.24wmf2 build l10n cache
13:32 logmsgbot: reedy updated /a/common to I543df75e3: Remove $wgDisableTextSearch and $wgDisableSearchUpdate overrides.
12:22 akosiaris: restarted morebots after upgrade of adminbot to 1.7.5
12:00 Krinkle: Running deleteEqualMessages.php on alswiki (bug 43917)
12:00 Krinkle: Running deleteEqualMessages.php on suwiki (bug 43917)
12:00 Krinkle: Running deleteEqualMessages.php on tlwiki (bug 43917)
11:59 Krinkle: Running deleteEqualMessages.php on nahwiktionary (bug 43917)
11:53 paravoid: reenabling ospf3 between cr1-eqiad/cr2-knams
11:50 paravoid: fixing private4/private6 ACLs to be consistent across all routers
06:47 _joe_: also ran puppet node clean to revoke certs, facts, etc (cp3013.esams.wikimedia.org cp3014.esams.wikimedia.org)
06:38 _joe_: ran puppetstoredconfigclean.rb for cp3013.esams.wikimedia.org cp3014.esams.wikimedia.org
05:20 springle: xtrabackup dbstore1001 to dbstore1002
03:29 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Apr 24 03:29:12 UTC 2014 (duration 29m 11s)
02:46 logmsgbot: LocalisationUpdate completed (1.24wmf1) at 2014-04-24 02:45:58+00:00
02:29 logmsgbot: LocalisationUpdate completed (1.23wmf22) at 2014-04-24 02:29:14+00:00

April 23

23:40 logmsgbot: catrope synchronized php-1.24wmf1/extensions/VisualEditor/lib/ve/modules/ve/ui/widgets/ve.ui.SurfaceWidget.js 'Fix surface focusing bug in Firefox'
23:39 logmsgbot: catrope synchronized php-1.24wmf1/extensions/VisualEditor/modules/ve-mw/init/ve.init.mw.Target.js 'Unbreak badtoken recovery in mobile VE'
23:38 logmsgbot: catrope synchronized php-1.24wmf1/resources/src/jquery/jquery.suggestions.js 'Handle CSS ellipsis when calculating suggestions widths'
23:37 logmsgbot: catrope synchronized php-1.23wmf22/resources/src/jquery/jquery.suggestions.js 'Handle CSS ellipsis when calculating suggestions widths'
21:55 bblack: moved cp301[34] ethernet ports to private1-esams
20:14 Krinkle: Running deleteEqualMessages.php on mlwiki (bug 43917)
20:14 Krinkle: Running deleteEqualMessages.php on miwiki (bug 43917)
20:11 Krinkle: Running deleteEqualMessages.php on gvwiki (bug 43917)
20:11 Krinkle: Running deleteEqualMessages.php on euwiktionary (bug 43917)
20:04 subbu: deployed Parsoid 9c99b0be (deploy SHA cf5eb4d0)
19:51 Krinkle: Running deleteEqualMessages.php on brwiki (bug 43917)
19:51 Krinkle: Running deleteEqualMessages.php on afwiki (bug 43917)
19:50 Krinkle: Running deleteEqualMessages.php on iawiki (bug 43917)
19:37 Krinkle: Running deleteEqualMessages.php on hrwiktionary (bug 43917)
19:37 Krinkle: Running deleteEqualMessages.php on hrwiki (bug 43917)
19:34 Krinkle: Running deleteEqualMessages.php on dawiki (bug 43917)
19:00 logmsgbot: reedy synchronized wmf-config/ 'I543df75e364171a71a48f18429972b662b542894'
18:58 logmsgbot: reedy updated /a/common to I865a08779: Fix $wmgBetaFeaturesWhitelist for labs too
18:58 Krinkle: Running deleteEqualMessages.php on amwiki (bug 43917)
17:59 logmsgbot: demon synchronized wmf-config/InitialiseSettings-labs.php 'no-op in prod, for completeness'
17:52 logmsgbot: demon synchronized all-labs.dblist 'no-op for prod, syncing for completeness'
17:52 logmsgbot: demon synchronized wikiversions-labs.json 'no-op for prod, syncing for completeness'
17:51 logmsgbot: demon updated /a/common to I960a792bc: Override wgSearchTypeAlternatives for beta to remove lucene
15:58 akosiaris: updated adminbot on apt.wikimedia.org to 1.7.5
15:57 manybubbles: rebuilding commons' cirrus search index
15:55 logmsgbot: manybubbles synchronized php-1.24wmf1/extensions/CirrusSearch/maintenance/updateOneSearchIndexConfig.php 'swat update to fix maintenance script'
15:46 ottomata: temporarily disabling puppet on analytics1003 to test some kafkatee settings
15:32 apergos: rebooting dataset2 hoping to detect the arrays on reboot
14:40 akosiaris: unexported /vol/{originals,thumbs} on nas1001-a, nas1-a
14:30 akosiaris: break replication for volumes originals, thumbs on nas1001-a, nas1-a
14:03 paravoid: adding AS path 1257 6830 (Tele2 -> UPC) to avoided paths @ cr1-esams/cr2-knams, multiple users reporting slowness issues
13:52 akosiaris: unmounted /vol/originals and /vol/thumbs on fenari (was /mnt/upload7, /mnt/thumbs2) see RT #7076
12:54 logmsgbot: marc synchronized wmf-config/interwiki.cdb 'Updating interwiki cache'
12:50 hashar: Jenkins back
12:46 hashar: restarting jenkins
12:46 hashar: Jenkins: upgrading email-ext and JobConfigHistory plugins (the later now supports slaves configs!)
10:35 hashar: Jenkins: update lanthanum slave agent to use java7
10:32 hashar: Jenkins switching integration-slave1001.eqiad.wmflabs java to use Java 7 . In https://integration.wikimedia.org/ci/computer/integration-slave1001/configure changed JavaPath to /usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java
07:45 mutante: nfs1 - delete some old kernels and zip mw logs last touched in 2012/13 to free some disk on /
07:32 mutante: nfs1 - re-enabled puppet
07:12 mutante: nfs2 - revoke puppet cert,salt key,stored configs
06:26 mutante: db48,db63 - revoke puppet cert, salt key, kill from storedconfigs
03:55 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Apr 23 03:55:03 UTC 2014 (duration 55m 2s)
03:09 logmsgbot: LocalisationUpdate completed (1.24wmf1) at 2014-04-23 03:08:58+00:00
02:46 logmsgbot: LocalisationUpdate completed (1.23wmf22) at 2014-04-23 02:46:47+00:00
02:25 manybubbles: restarted rebuilding common's Cirrus index after something crashed. going to get more logging out of it if it crashes again. or it'll work. Either way. Like last time the Elasticsearch check might freak out for a bit after it finished because shards are assigning. That can be ignored for an hour or so.

April 22

23:30 logmsgbot: ori Finished scap: I595446dc5, If2c57846f, Iaa232298e (duration: 00m 45s)
23:30 logmsgbot: ori Started scap: I595446dc5, If2c57846f, Iaa232298e
23:28 logmsgbot: ori synchronized php-1.24wmf1/extensions/EventLogging 'Update EventLogging for Iaa232298e: Set line-height for code icon on schema pages (bug 64251)'
23:27 logmsgbot: ori synchronized wmf-config/InitialiseSettings.php 'If2c57846f: Enable survey option in MediaViewer on a few more wikis'
23:26 logmsgbot: ori updated /a/common to If2c57846f: Enable survey option in MediaViewer on a few more wikis
23:24 logmsgbot: ori synchronized php-1.23wmf22/extensions/MultimediaViewer 'Update MultimediaViewer for I595446dc5: Add more survey languages (fr, de, pt/pr-br)'
23:23 logmsgbot: ori synchronized php-1.24wmf1/extensions/MultimediaViewer 'Update MultimediaViewer for I595446dc5: Add more survey languages (fr, de, pt/pr-br)'
22:28 logmsgbot: aaron synchronized php-1.24wmf1/maintenance/populateImageSha1.php '32d9206'
22:22 logmsgbot: reedy synchronized php-1.24wmf1/extensions/TimedMediaHandler 'I7483c8b7ec75f5149998da2b530ca04'
21:31 logmsgbot: spage synchronized php-1.24wmf1/extensions/Flow/modules/discussion/styles/mixins/collapse.less 'Fix Flow collapsed topics on mw.org'
21:14 logmsgbot: spage synchronized wmf-config/InitialiseSettings.php 'Enable Flow on Compact Personal Bar talk'
21:12 logmsgbot: spage updated /a/common to I851651247: Non wikipedias to 1.24wmf1
20:35 logmsgbot: demon synchronized wmf-config/CommonSettings.php 'No op in prod, disables lsearchd completely for beta'
20:28 ottomata: turning on varnishkafka on text varnishes
20:12 MatmaRex: wikibugs replaced by pywikibugs (https://github.com/valhallasw/pywikibugs) and moved to #wikimedia-dev (at last!)
20:12 manybubbles: rebuilding the search index for a few wikis - might cause the Elasticsearch health check to freak out because it sucks
19:51 MatmaRex: wikibugs is down, let's not bring it back up
19:31 Krinkle: Reloading Zuul to deploy config change I9c2f94b138244ab8
19:05 hashar: Jenkins killed Jenkins java process on deployment-bastion.eqiad.wmflab to free up the executor and threads entirely.
18:55 hashar: restarted Zuul to clean up some stuck jobs from the queue
18:49 hashar: Jenkins deployment-bastion.eqiad.wmflab is back online: Slave successfully connected and online
18:47 logmsgbot: reedy synchronized docroot and w
18:46 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: non wikipedias to 1.24wmf1
18:43 RobH: tridge back and accessible
18:42 hashar: Jenkins deposing / repooling deployment-bastion.eqiad.wmflabs slave locked up somehow, the executors are no more taken in account by Jenkins master
18:33 RobH: resurrecting tridge in pmtpa
18:00 RobH: tridge is coming dow for relocation, shouldnt disrupt anything but backups in progress
17:52 bblack: disable cp301[34] (mobile varnish frontends) in pybal on fenari
17:21 awight: update crm from 7dafce5 to cfe34fe
16:45 mark: Reenabled Apache and puppet on fenari
16:40 logmsgbot: aaron synchronized php-1.24wmf1/includes/filerepo/file/LocalFile.php 'e9807d0'
16:15 logmsgbot: reedy updated /a/common to I55954c612: Commit updated interwiki.cdb file
15:47 logmsgbot: demon synchronized wmf-config/InitialiseSettings.php 'Collection back on, server move over'
15:14 logmsgbot: demon synchronized wmf-config/InitialiseSettings.php 'Icb6b4bad: Updated $wgForceUIMsgAsContentMsg for commonswiki'
14:59 cmjohnson1: shutting down and relocating virt0 and pdf2
14:50 logmsgbot: marc synchronized wmf-config/interwiki.cdb 'Updating interwiki cache'
14:41 logmsgbot: marc synchronized wmf-config/interwiki.cdb 'Updating interwiki cache'
14:33 manybubbles: populating cirrus indexes for all remaining wikis
14:19 akosiaris: added bblack account on all junipers
14:18 manybubbles: building new elasticsearch indexes for the last wikis that didn't have them. the cluster may go red as the indexes are assigned. silly nagios check.
14:15 logmsgbot: manybubbles synchronized wmf-config/ 'cirrus for more wikis and disable collection for more'
14:13 logmsgbot: manybubbles synchronized docroot/noc/createTxtFileSymlinks.sh 'noncirrus is removed'
14:09 cmjohnson1: mchenry and sanger going down for server relocation
13:26 mark: Disabled puppet and apache on fenari
13:25 paravoid: second pass of swiftrepl eqiad->esams
12:04 logmsgbot: faidon synchronized wmf-config/squid.php 'add cp3013/cp3014 IPv6 addresses'
12:04 logmsgbot: faidon updated /a/common to If8f39abee: squid.php: add cp3013/cp3014 IPv6 addresses
10:35 akosiaris: upgraded php-luasandbox to 1.9.1 on beta (deployment-apache0{1,2})
10:25 akosiaris: upgraded php-luasandbox to 1.9-1 on test.wikimedia.org
10:18 mutante: harmon - delete salt key
09:04 mutante: hooper - revoked puppet cert
08:47 mutante: upgrading Bugzilla to 4.4.4
06:58 logmsgbot: demon synchronized wmf-config/InitialiseSettings.php 'Unbreak $wmgBetaFeaturesWhitelist'
06:44 mutante: db77 - revoke puppet cert,salt key,rm from monitoring
05:14 springle: db68 down. s1-analytics-slave cname to db1007
04:35 paravoid: reactivate esams<->HE & eqiad<->HE peerings; issues are confirmed to be resolved
03:32 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Apr 22 03:32:29 UTC 2014 (duration 32m 28s)
02:43 logmsgbot: LocalisationUpdate completed (1.24wmf1) at 2014-04-22 02:43:01+00:00
02:30 logmsgbot: LocalisationUpdate completed (1.23wmf22) at 2014-04-22 02:29:34+00:00
01:16 awight: nevermind previous update.
01:11 awight: update from 7dafce5 to cfe34fe
00:18 logmsgbot: catrope synchronized php-1.24wmf1/extensions/MobileFrontend/javascripts/modules/editor/VisualEditorOverlay.js 'Fix JS error on save'
00:15 Reedy: torrus (on manutius) is down
00:05 RoanKattouw: Restarted ircecho on ekrem, IRC working again now
00:00 K4-713: updated payments from 2819549 --> 4811f6d

April 21

23:57 RoanKattouw: Started ircd on ekrem, startup doesn't seem to be puppetized
23:24 logmsgbot: catrope synchronized php-1.23wmf22/extensions/MultimediaViewer 'SWAT deploy cherry-picks'
23:24 logmsgbot: catrope synchronized php-1.24wmf1/extensions/MobileFrontend 'SWAT deploy cherry-picks'
23:23 logmsgbot: catrope synchronized php-1.24wmf1/extensions/MultimediaViewer 'SWAT deploy cherry-picks'
23:23 logmsgbot: catrope synchronized php-1.24wmf1/extensions/VisualEditor 'SWAT deploy cherry-picks'
23:16 logmsgbot: catrope synchronized wmf-config/InitialiseSettings.php 'Lithuanian namespace aliases for betawikiversity'
23:11 logmsgbot: catrope synchronized wmf-config/CommonSettings.php 'BetaFeatures whitelist'
23:11 logmsgbot: catrope synchronized wmf-config/InitialiseSettings.php 'Beta Features whitelist'
22:50 RoanKattouw: Restarted stuck Jenkins
21:33 awight: updated payments: af35b7b --> 2819549
20:45 andrewbogott: rebooted wtp1018
20:45 subbu: deployed Parsoid ec51e5d1 (deploy SHA 0dd607fc)
20:31 manybubbles: rolling restart on remaining Elasticsearch servers to get the plugin (1010, 1011, 1012, 1015)
19:45 manybubbles: rolling restart on more of the elasticsearch servers to pick up plugins (06, 07, 09)
19:15 cmjohnson1: shutting down and relocating dobson
19:15 cmjohnson1: shutting down and relocating linne
17:39 cmjohnson: shutting down and relocating fenari
17:37 cmjohnson: shutting down mexia to relocate to 12th floor
17:25 logmsgbot: aaron synchronized php-1.24wmf1/includes/filerepo/file/LocalFile.php '2026e4a'
17:22 logmsgbot: aaron synchronized wmf-config/PoolCounterSettings-eqiad.php 'Adjust large file download pool counter config to tie up less workers'
17:18 logmsgbot: aaron synchronized php-1.23wmf22/includes/filerepo/file/LocalFile.php '01ce288'
17:13 logmsgbot: demon synchronized wmf-config/InitialiseSettings.php 'renderfile-nonstandard throttle config'
17:03 logmsgbot: aaron synchronized php-1.24wmf1/thumb.php '44c4658'
17:02 logmsgbot: aaron synchronized php-1.23wmf22/thumb.php '9591365'
16:34 ottomata: reinstalling elastic1013 (elastic1014 is still coming back online, but I don't want there to an extra eligible master for long)
16:04 ottomata: reinstalling elastic1014
15:22 cmjohnson1: dataset2 going down to be relocated to the 12th floor
13:41 manybubbles: rolling restart on some of the Elasticsearch servers to pick up new plugins. should not cause any trouble.
13:05 Reedy: De-activated status.wm.o monitor for icinga due to false positive from HTTP auth
12:54 paravoid: demoting myself, removing Commons crat/admin rights
12:41 paravoid: escalating myself to Commons bureaucrat/admin, then adding GWToolset privileges
12:40 paravoid: deleting 29 GWToolset XML under Swift's wikipedia-commons-gwtoolset-metadata container for user Fæ/
11:51 logmsgbot: reedy synchronized php-1.23wmf22/extensions/TimedMediaHandler 'I7483c8b7ec75f5149998da2b530ca04'
11:50 paravoid: deactivating esams<->HE peering, >90% packet loss between lon<->nyc
11:49 paravoid: deactivating eqiad<->HE peering, >90% packet loss between lon<->nyc
11:45 logmsgbot: reedy synchronized php-1.23wmf22/extensions/TimedMediaHandler 'I7483c8b7ec75f5149998da2b530ca0467ac70de7'
03:55 springle: reset pc100* slaves previously replicating from pmtpa
03:32 ori: 5.5k fatals over last 20 hrs, of which 3.5k are calls to doTransform() on a non-object at TimedMediaThumbnail.php:201, and 0.9k are Lua API OOMs at LuaSandbox/Engine.php:264
03:30 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Apr 21 03:30:15 UTC 2014 (duration 30m 14s)
03:26 ori: ap_busy_workers spike on image scalers eqiad, started ~2:55, subsided around ~3:20
02:42 logmsgbot: LocalisationUpdate completed (1.24wmf1) at 2014-04-21 02:42:30+00:00
02:29 logmsgbot: LocalisationUpdate completed (1.23wmf22) at 2014-04-21 02:29:49+00:00

April 20

18:51 ori: restarted grrrit-wm by following instructions on https://wikitech.wikimedia.org/wiki/Grrrit-wm#Restarting_the_bot
11:12 Nemo_bis: grrrit dead: 10.28 -!- grrrit-wm [tools.lolr@208.80.155.145] has quit
03:26 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Apr 20 03:26:48 UTC 2014 (duration 26m 47s)
02:39 logmsgbot: LocalisationUpdate completed (1.24wmf1) at 2014-04-20 02:39:23+00:00
02:28 logmsgbot: LocalisationUpdate completed (1.23wmf22) at 2014-04-20 02:28:09+00:00

April 19

03:27 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Apr 19 03:27:52 UTC 2014 (duration 27m 51s)
02:41 logmsgbot: LocalisationUpdate completed (1.24wmf1) at 2014-04-19 02:41:02+00:00
02:29 logmsgbot: LocalisationUpdate completed (1.23wmf22) at 2014-04-19 02:29:33+00:00

April 18

21:00 hashar: Jenkins renamed mw-jenkinsbot irc bot to wmf-insecte (french for "bug"). Updated IRC conf to point to chat.freenode.net:7000 with SSL.
19:02 bblack: enabled cp30[14] varnish mobile frontends in esams pybal
17:50 bblack: cp301[34] reinstalls complete, should stay ok in monitoring
17:48 ottomata: resinsalling elastic1008
16:20 springle: db48 mysqld shutdown for decom
16:20 bblack: ignore cp301[34] msgs, reinstalling them
16:10 springle: db63 mysqld shutdown for decom
15:53 ottomata: reinstalling elastic1007
15:52 springle: db48 mysqld set read_only, disabled m2 repl to db1048
15:51 ottomata: disabling puppet on elasti1007 and elastic1008 for reformatting
15:45 mutante: DNS update - removing Tampa msbe/msfe
15:38 Jeff_Green: switched mchenry to use db1048/db1049 for OTRS address lookups
15:24 mutante: DNS update - removing all the Tampa mw/srv mgmt
15:15 mutante: DNS update - removing lvs1-6
14:54 mutante: es5,es6 - revoke puppet certs, salt keys, icinga
14:51 ottomata: powering down stat1 for decom
14:43 mutante: ms-fe[14] - shutting down
14:41 ottomata: disabling puppet on stat1 for decom
14:37 mutante: ms-be 1-12, Tampa Swift boxes, shutdown
14:24 mutante: ms-fe[14] - stop puppet,revoke certs,remove icinga
13:54 mutante: ms-be1-12 - removing from puppet,salt,icinga
13:06 mutante: Bugzilla Apache, changed SSL cipher suite in I7e9adc182dc ,might cost a a few % performance but zirconium had plenty
11:48 hashar: removing mw-jenkinsbot (the wikimedia jenkins installation) from #wikimedia-labs
10:10 hashar: Jenkins upgraded to 1.532.3.
10:06 hashar: Upgrading Jenkins to latest LTS version 1.532.3
07:57 mutante: DNS update - remove api.svc, arptest.pmtpa ..
06:31 logmsgbot: demon synchronized wmf-config/InitialiseSettings.php 'Next round of wikis done building Cirrus indexes, throw into beta mode'
04:04 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Apr 18 04:04:21 UTC 2014 (duration 4m 20s)
03:06 logmsgbot: LocalisationUpdate completed (1.24wmf1) at 2014-04-18 03:06:06+00:00
02:39 logmsgbot: LocalisationUpdate completed (1.23wmf22) at 2014-04-18 02:39:51+00:00
00:21 logmsgbot: ori synchronized wmf-config/CommonSettings.php 'Ie9b265be9: Enable GlobalCssJs on testwiki & test2wiki (2/2)'
00:21 logmsgbot: ori synchronized wmf-config/InitialiseSettings.php 'Ie9b265be9: Enable GlobalCssJs on testwiki & test2wiki (1/2)'
00:20 logmsgbot: ori updated /a/common to Ie9b265be9: Enable GlobalCssJs on testwiki & test2wiki
00:18 logmsgbot: ori Finished scap: Cherry-pick Ibe8e67ebf for MobileFrontend on 1.23wmf22 and 1.24wmf1; add GlobalCssJs extension to 1.24wmf1 and 1.23wmf22 (duration: 32m 53s)

April 17

23:45 logmsgbot: ori Started scap: Cherry-pick Ibe8e67ebf for MobileFrontend on 1.23wmf22 and 1.24wmf1; add GlobalCssJs extension to 1.24wmf1 and 1.23wmf22
23:39 logmsgbot: ori scap failed: CalledProcessError Command '/usr/local/bin/mw-update-l10n' returned non-zero exit status 1 (duration: 00m 24s)
23:38 logmsgbot: ori Started scap: Cherry-pick Ibe8e67ebf for MobileFrontend on 1.23wmf22 and 1.24wmf1; add GlobalCssJs extension to 1.24wmf1
23:37 logmsgbot: ori updated /a/common to I2a2abd7f3: Add GlobalCssJs to extension-list
23:29 logmsgbot: ori synchronized wmf-config/InitialiseSettings.php 'I52378a4b4: Add meta to legalteamwiki import sources'
23:28 logmsgbot: ori updated /a/common to I52378a4b4: Add meta to legalteamwiki import sources
23:27 logmsgbot: ori synchronized wmf-config/CommonSettings.php 'I373df6138: Normalize TextExtracts config handling (2/2)'
23:27 logmsgbot: ori synchronized wmf-config/InitialiseSettings.php 'I373df6138: Normalize TextExtracts config handling (1/2)'
23:26 logmsgbot: ori updated /a/common to I373df6138: Normalize TextExtracts config handling
23:24 logmsgbot: ori synchronized wmf-config/InitialiseSettings.php 'I7841f74b0: Kill all vestiges of $wgMFRemovableClasses (2/2)'
23:24 logmsgbot: ori synchronized wmf-config/mobile.php 'I7841f74b0: Kill all vestiges of $wgMFRemovableClasses (1/2)'
23:23 logmsgbot: ori updated /a/common to I7841f74b0: Kill all vestiges of $wgMFRemovableClasses
23:18 logmsgbot: ori synchronized wmf-config/InitialiseSettings.php 'I1795c70d1: Create a FeaturedFeed for the Tech News bulletin (2/2)'
23:17 logmsgbot: ori synchronized wmf-config/FeaturedFeedsWMF.php 'I1795c70d1: Create a FeaturedFeed for the Tech News bulletin (1/2)'
23:16 logmsgbot: ori updated /a/common to I1795c70d1: Create a FeaturedFeed for the Tech News bulletin
23:14 logmsgbot: ori synchronized php-1.23wmf22/extensions/ApiSandbox 'I9a56b2c5a: Update ApiSandbox'
23:12 K4-713: updates antifraud rules in payments
22:03 andrewbogott: updated default labs precise image (heartbleed fix)
20:41 manybubbles: elastic1016 restarted and not freaking out any more.
20:37 _joe_: restarting gitblit in order to prevent crippling due to the usual memory leak
20:28 manybubbles: restarting elastic1016 - it is freaking out. If it happens again I'll dig deeper, but for now I consider it a fluke of the rolling restarts today....
20:20 RobH: sorry for the misc-web-lb issues folks, they should be resolved at this time (for now)
20:19 paravoid: lvs1002/1005: commenting first resolv.conf entry until we have a more permanent fix, restarting pybal
20:18 paravoid: disabling puppet on lvs1002/lvs1005
19:57 RobH: still working on issue
19:57 RobH: both cp1043 and cp1044 seem online and serving nginx service, but pybal says they are down still working
19:46 ottomata: power off emery
19:40 RobH: replacing ticket.wikimedia.org cert/key, apache may hiccup
19:33 RobH: blog.w.o cert replacement successful
19:30 ottomata: disabling puppet on emery for decommission
19:29 RobH: blog.w.o certificate swap (yes, again ;), apache may hiccup
19:10 logmsgbot: reedy synchronized wmf-config/
19:09 logmsgbot: reedy synchronized database lists files: I6fc44d3eb829d656d352dab652148dd327b06679
19:04 ottomata: reinstalling elastic1001
18:59 logmsgbot: faidon synchronized wmf-config/CommonSettings.php 'reenable CN CrossWiki Hiding'
18:58 logmsgbot: faidon updated /a/common to Ie95165065: Reenable CentralNotice CrossWiki Hiding
18:57 logmsgbot: reedy synchronized wmf-config/
18:34 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Touch for I0c36c65bb9f405e03b84d3f6c6b93acda522c5c9'
18:33 logmsgbot: reedy synchronized database lists files: I0c36c65bb9f405e03b84d3f6c6b93acda522c5c9
18:30 ottomata: switching erbium udp2log instance from consuming multicast relay to unicast direct from varnishes
18:21 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.24wmf1
18:10 ottomata: stopping puppet on elastic1001 and elastic1002, reinstalling elastic1002
18:02 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikipedias to 1.23wmf22
16:06 logmsgbot: reedy Finished scap: testwiki to 1.24wmf1 and build l10n cache (duration: 26m 06s)
15:47 logmsgbot: anomie synchronized php-1.23wmf22/extensions/VisualEditor 'SWAT: 126913 - backport to wmf22 of critical fixes for the Math extension's VisualEditor tool'
15:47 logmsgbot: anomie synchronized php-1.23wmf22/extensions/Math 'SWAT: 126913 - backport to wmf22 of critical fixes for the Math extension's VisualEditor tool'
15:40 logmsgbot: reedy Started scap: testwiki to 1.24wmf1 and build l10n cache
15:28 logmsgbot: faidon synchronized wmf-config/CommonSettings.php 'disable CN CrossWiki Hiding again'
15:27 logmsgbot: faidon updated /a/common to If74ba5a52: Revert "Enable CentralNotice CrossWiki Hiding"
15:17 manybubbles: updgraded site plugins on Elasticsearch nodes
15:04 ottomata: reinstalling elastic1016
14:02 logmsgbot: reedy updated /a/common to I290bd1ea6: Remove further pmtpa remnants
13:41 manybubbles: synced experimental highlighter to elasticsearch nodes - they'll pick it up on restart
11:05 logmsgbot: reedy synchronized wmf-config/ 'I290bd1ea628563646c02651041fa2cec4a320b56'
10:56 mutante: lvs3,lvs4,lvs5,lvs6 - shutdown
10:42 mutante: lvs1, lvs2 shutdown
10:15 mutante: re-deleting unaccepted salt keys for virt2,5-11
10:11 mutante: lvs1-6 - disable puppet,salt,revoke certs,keys
08:28 mutante: db35,db38 - shutdown
07:47 mutante: db35,db38, stop puppet and salt, revoke certs,keys
07:45 mutante: restarting gitblit
03:49 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Apr 17 03:49:19 UTC 2014 (duration 49m 18s)
03:06 subbu: deployed Parsoid 0bccf02c (deploy SHA 5e25f3b05) @ 1:30 pm PST, Apr 16th, 2014
03:02 logmsgbot: LocalisationUpdate completed (1.23wmf22) at 2014-04-17 03:02:25+00:00
02:33 logmsgbot: LocalisationUpdate completed (1.23wmf21) at 2014-04-17 02:33:47+00:00
02:03 springle: stop mysqld on db35 (m1) for decom
01:42 springle: xtrabackup db63 to db60

April 16

23:19 logmsgbot: mwalker Finished scap: SWAT deploy: configuration change 126223 and multimediaviewer 126852 (duration: 04m 07s)
23:15 logmsgbot: mwalker Started scap: SWAT deploy: configuration change 126223 and multimediaviewer 126852
20:03 RobH: osmium cleared from salt, puppetca, and puppetstoredconfig for reinstall with trusty (ignore any icinga alerts, there are no pages)
18:58 ottomata: reinstalling elastic1015
17:54 logmsgbot: reedy synchronized php-1.23wmf22/includes/jobqueue/ 'I4b4dbe4637dc50cd4630ef19d54f01efba10e138'
17:09 paravoid: starting swiftrepl on copper for eqiad->esams copy
17:08 logmsgbot: ori synchronized wmf-config/InitialiseSettings.php 'I7b6e5c2d7: Enable web fonts by default on Hebrew Wikisource'
17:07 logmsgbot: ori synchronized fc-list 'Ib7b2bc21a: updated fonts list and sorted it, rt #810'
17:06 logmsgbot: ori updated /a/common to I7b6e5c2d7: Enable web fonts by default on Hebrew Wikisource
16:37 logmsgbot: reedy synchronized php-1.23wmf21/includes/jobqueue/JobQueueRedis.php 'I678ab55ae3678b5cd944393f2f2048851625f153'
16:36 logmsgbot: reedy synchronized php-1.23wmf22/includes/jobqueue/JobQueueRedis.php 'I678ab55ae3678b5cd944393f2f2048851625f153'
15:38 ottomata: reinstalling elastic1012
13:50 manybubbles: restarting elastic1009 to suck up new config
13:50 manybubbles: raised the number of replicas for labswiki's search directly in elasticsearch because I can't easilly do for cirrus due to access restrictions
13:45 ottomata: reinstalling elastic1011
13:22 mutante: DNS update - remove virt5-15
12:11 mutante: virt5-11 - shut down
11:40 akosiaris: upgraded python-voluptuous on apt.wikimedia.org to 0.8.2-1wmf1
11:39 hashar: Upgraded Zuul to wmf-deploy-20140416-3 (bring in a84f0e4 - "Make queue processing more efficient" which was much needed)
11:29 hashar: upgraded Zuul to wmf-deploy-20140416-2
11:15 mutante: virt5-11 removing from icinga
11:03 mutante: virt5-11 revoked puppet certs and salt keys
10:56 mutante: stopping puppet on virt5-11
10:47 hashar: Upgraded Zuul on gallium to wmf-deploy-20140416 (depends on python-voluptuous 0.7+ , Alexandros packaged 0.8.2 which I manually installed to validate).
09:26 mutante: disabling mw1163 in pybal
07:03 mutante: zirconium - upgrading apache2, php5 packages
06:07 springle: stop mysqld on db38 (x1) for decom
03:46 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Apr 16 03:46:23 UTC 2014 (duration 46m 22s)
02:55 logmsgbot: LocalisationUpdate completed (1.23wmf22) at 2014-04-16 02:55:28+00:00
02:28 logmsgbot: LocalisationUpdate completed (1.23wmf21) at 2014-04-16 02:28:01+00:00
00:43 K4-713: updated listener credentials on thulium

April 15

23:25 logmsgbot: mwalker synchronized wmf-config/abusefilter.php '126168 more abuse filter configuration fun'
23:21 logmsgbot: mwalker Finished scap: Configuration change 126163 and MultimediaViewer 126158 (duration: 02m 15s)
23:19 logmsgbot: mwalker Started scap: Configuration change 126163 and MultimediaViewer 126158
23:08 logmsgbot: mwalker Finished scap: Configuration changes, 113656, 121834, 126065 (duration: 03m 11s)
23:05 logmsgbot: mwalker Started scap: Configuration changes, 113656, 121834, 126065
23:01 hashar: restarting Zuul to clear leaked file descriptor (know issue, fixed upstream)
22:12 awight: crm updated from e3f2859 to 7dafce5
21:51 manybubbles: restarting elastic1009 again
21:39 hashar: jenkins /var/lib/git cleaned up on gallium
21:16 manybubbles: restarting elastic1009 to test performance changes. cluster will go yellow for a few minutes. might go red (wikitech is busted)
21:15 hashar: Jenkins is processing jobs again
21:14 hashar: cleared /tmp/ on integration-slave1002 (filled up by hhvm job, known issue, bug filled already)
21:12 hashar: Zuul locked again :/ Unpooling and repooling Jenkins slaves.
19:50 RoanKattouw: Restarting stuck Jenkins
19:31 manybubbles: setting refresh interval on elasticsearch indexes to 30s to test effect on load
19:24 logmsgbot: reedy synchronized wmf-config/
19:20 logmsgbot: reedy synchronized php-1.23wmf22/includes/PrefixSearch.php 'I82b5ca65864099c180d915055c43e6839bd4f4a2'
19:07 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikisources back to 1.23wmf22
19:07 ottomata: reinstalling elastic1010
19:07 logmsgbot: reedy synchronized php-1.23wmf22/extensions/ProofreadPage
18:41 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikisources back to 1.23wmf21 due to ProofreadPage fatal
18:36 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Non Wikipedias to 1.23wmf22
17:09 paravoid: stopped pybal on lvs1005
17:06 cmjohnson1: fixing lvs1005 eth1 cable
16:56 cmjohnson1: mw1057 replacing ethernet cable
16:50 manybubbles: raised "new generation" size on elastic1009 to test a performance theory
16:50 cmjohnson1: mw1093 replacing ethernet cable
16:40 cmjohnson1: replacing eth cable on mw1193
16:31 hashar: ... all Jenkins jobs are using /srv/ssd/gerrit instead
16:30 hashar: gallium had two Gerrit replications streams, one of them got removed 122419 thus deleting the target directories under /var/lib/git
16:22 cmjohnson1: shutting down mw1163 to replace DIMM
16:18 cmjohnson1: swapping bad disk slot 4 on dataset1001
16:13 paravoid: moving ms-fe3xxx/ms-be3xxx to private1-esams
16:05 ottomata: reinstalling elastic1009
15:21 logmsgbot: anomie synchronized php-1.23wmf21/extensions/Flow 'SWAT: Flow: Prevent logspam on enwiki 125930'
15:13 logmsgbot: anomie synchronized php-1.23wmf21/extensions/Flow 'SWAT: Flow: Prevent logspam on enwiki 125930'
15:02 mutante: DNS update - removing Tampa service IPs
13:51 hashar: Jenkins compressing console logs of builds. On gallium as user jenkins : find /var/lib/jenkins/jobs -wholename '*/builds/*/log' -type f -exec gzip --best {} \;
13:42 hashar: Command executed (as gerritslave user): find /srv/ssd/gerrit -type d -name '*.git' -exec bash -c 'echo; date; cd {}; echo; pwd; echo; git repack -ad; date;' \;
13:41 hashar: Repacking Gerrit replicated repositories on lanthanum and gallium (both under /srv/ssd/gerrit/ )
13:13 andrewbogott: shutdown and decommissioned virt12
12:19 paravoid: adding ms-be101[345] to Swift eqiad's rings, at 33% weight; old rings kept at ms-fe1001:~/swift-2014-04-14
11:30 mutante: DNS update - removed dbdump.pmtpa.wmnet
11:26 mutante: DNS update - remove db64,db65,db66,db67,db70
10:55 mutante: db64,db67 - powerdown via mgmt
10:51 mutante: db65,db66 - shutdown
10:07 mutante: db70 - powerdown via mgmt
09:47 mutante: db64-67 - puppetstoredconfigclean.rb db${db}.pmtpa.wmnet ; puppetca --clean db${db}.pmtpa.wmnet ; salt-key -d db${db}.pmtpa.wmnet
07:02 springle: shutdown db67 for decom. analytics data is backed up on dbstore1002
06:47 springle: moving pmtpa m1 and x1 slaves to db73 and db69 on 12th floor
03:25 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Apr 15 03:25:52 UTC 2014 (duration 25m 51s)
02:42 logmsgbot: LocalisationUpdate completed (1.23wmf22) at 2014-04-15 02:42:48+00:00
02:22 logmsgbot: LocalisationUpdate completed (1.23wmf21) at 2014-04-15 02:22:54+00:00
00:09 gwicke: enabled wikinews family in Parsoid with temporary live patch to un-break VE deploy

April 14

23:43 logmsgbot: ori Finished scap: (no message) (duration: 04m 31s)
23:39 ori: scap: php-1.23wmf22/extensions/VisualEditor 2b0979f...0652ad2 (I12e5c9751)
23:38 logmsgbot: ori Started scap: (no message)
23:17 logmsgbot: ori synchronized php-1.23wmf22/skins/vector/variables.less 'Ibcdaff017: Revert body font stack to be just sans-serif'
23:15 logmsgbot: ori synchronized wmf-config/InitialiseSettings.php 'I22f25730d: Enable VisualEditor for opt-in on Meta (2/2)'
23:15 logmsgbot: ori synchronized visualeditor.dblist 'I22f25730d: Enable VisualEditor for opt-in on Meta (1/2)'
23:14 logmsgbot: ori updated /a/common to I22f25730d: Enable VisualEditor for opt-in on Meta
23:12 logmsgbot: ori synchronized visualeditor.dblist 'I59f5a6e0b: Enable VisualEditor on French Wikinews'
23:12 logmsgbot: ori updated /a/common to I59f5a6e0b: Enable VisualEditor on French Wikinews
22:56 logmsgbot: aaron synchronized wmf-config/PoolCounterSettings-eqiad.php 'Limit large (djvu) file downloads for thumbnails'
20:37 mwalker: updating payments wiki for worldpay currencies (from af35b7b to 8a93c17)
20:13 subbu: deployed Parsoid fba548cbf (deploy repo sha d0e12ddf)
17:47 paravoid: fixing /e/n/interfaces for static configuration: gadolinium hafnium labsdb1001 labsdb1002 labsdb1003 labstore1001 searchidx1001 ssl1005 ssl1006 ssl1009 virt1001 ytterbium
17:37 paravoid: fixing /e/n/interfaces for static configuration for cp40xx, lvs40xx
17:14 mutante: brewster - power down, could not revive due to disk or SATA controller fail
16:57 ottomata1: shutting down elastic1006 for reinstall
16:45 mutante: powering brewster back on
16:40 paravoid: powering up brewster
16:13 mutante: deleted old svn apache config on formey, started apache
15:22 paravoid: restarting virt0's salt-master, glance-api, glance-registry, keystone, nova-scheduler
15:11 logmsgbot: manybubbles synchronized wmf-config/CirrusSearch-common.php 'SWAT Cirrus update to improve performance'
15:09 logmsgbot: manybubbles synchronized php-1.23wmf21/extensions/CirrusSearch/ 'SWAT deploy to improve performance'
14:48 paravoid: upgrading all snapshot* hosts
14:38 paravoid: upgrading all packages & staggered restart of all of swift (ms-fe/ms-be)
13:22 logmsgbot: reedy synchronized php-1.23wmf22/includes/api/ApiFeedRecentChanges.php 'I268d0a53067738ba96bee74c593358b0b28cc083'
13:22 logmsgbot: reedy synchronized php-1.23wmf21/includes/api/ApiFeedRecentChanges.php 'I268d0a53067738ba96bee74c593358b0b28cc083'
13:15 paravoid: staggered upgrades for all pending updates on all mw* boxes & restarting apaches/other core services
11:08 mutante: brewster - shut down
10:49 logmsgbot: reedy synchronized wmf-config/interwiki.cdb 'Updating interwiki cache'
10:05 apergos: had to toss extensions/Elastica on virt1000 and run git submodule update --init --recursive seems to be working now
09:26 mutante: deleting huge pybal log on lvs3001
09:01 mutante: brewster - stop lighttpd,bacula-fd,haproxy,dhcp3-server,rsync,nrpe,salt
07:54 mutante: brewster - disabling puppet agent, removed from site.pp, revoke puppet cert
03:23 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Apr 14 03:22:58 UTC 2014 (duration 22m 57s)
02:42 logmsgbot: LocalisationUpdate completed (1.23wmf22) at 2014-04-14 02:42:05+00:00
02:23 logmsgbot: LocalisationUpdate completed (1.23wmf21) at 2014-04-14 02:22:58+00:00

April 13

03:19 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Apr 13 03:19:45 UTC 2014 (duration 19m 44s)
02:39 logmsgbot: LocalisationUpdate completed (1.23wmf22) at 2014-04-13 02:39:11+00:00
02:20 logmsgbot: LocalisationUpdate completed (1.23wmf21) at 2014-04-13 02:20:49+00:00

April 12

05:03 logmsgbot: ori updated /a/common to I5f900190c: Replace $channel with $variant; make it Beta-only
03:21 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Apr 12 03:21:43 UTC 2014 (duration 21m 42s)
02:42 logmsgbot: LocalisationUpdate completed (1.23wmf22) at 2014-04-12 02:42:07+00:00
02:23 logmsgbot: LocalisationUpdate completed (1.23wmf21) at 2014-04-12 02:23:02+00:00

April 11

23:55 RoanKattouw: Restarting stuck Jenkins
23:35 K4-713: synchronized payments to af35b7b
23:25 K4-713: synchronized payments to b321163
19:50 ottomata: upgraded wikitech to MediaWiki 1.23wmf22, applied security patch
18:19 ottomata: rebooting elastic1003
18:14 Krinkle: git-deploy: Deploying integration/slave-scripts I38b90e8c08d7cb
18:08 Krinkle: git-deploy: Deploying integration/slave-scripts I04d8e308daedb3ccb8
17:41 Krinkle: git-deploy: Deploying integration/slave-scripts 'Ia9ee438fa2675170'
14:27 ottomata: reinstalling elastic1005
04:33 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Apr 11 04:33:40 UTC 2014 (duration 33m 39s)
03:47 logmsgbot: LocalisationUpdate completed (1.23wmf22) at 2014-04-11 03:47:01+00:00
02:41 logmsgbot: LocalisationUpdate completed (1.23wmf21) at 2014-04-11 02:41:33+00:00
02:29 ori_: graphite: carbon instance 'f' saturates a cpu core. it's the instance that mediawiki profiling data gets hashed to. collector should probably emit to statsd and have statsd compute per-minute rollups
00:06 marktraceur: leaving MultimediaViewer slightly broken on enwiki based on the fact that logged-in users seem mostly unaffected and other wikis aren't seeing issues, will investigate more tomorrow and fix on Monday

April 10

23:54 bd808: Enabled beta update Jenkins job (https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/)
23:37 logmsgbot: mwalker Finished scap: Attempting to regenerate i18n keys for multimediaviewer (duration: 03m 33s)
23:34 logmsgbot: mwalker Started scap: Attempting to regenerate i18n keys for multimediaviewer
23:16 logmsgbot: mwalker synchronized wmf-config/filebackend.php
23:09 logmsgbot: mwalker synchronized wmf-config/InitialiseSettings.php 'touched to see if that pushes changes to FileBackend.php'
23:03 mwalker: sync-common for 125340 and 125335
22:53 logmsgbot: krinkle synchronized php-1.23wmf21/extensions/VisualEditor/modules/ve-mw/ui/tools/ 'touch *.js'
22:35 logmsgbot: krinkle synchronized php-1.23wmf21/extensions/VisualEditor/modules/ve-mw/ui/tools/ve.ui.MWReferenceDialogTool.js 'touch'
22:12 logmsgbot: krinkle synchronized php-1.23wmf21/extensions/VisualEditor/lib/ve/modules/ve/ui/ve.ui.Toolbar.js 'touch'
22:10 logmsgbot: krinkle synchronized php-1.23wmf21/extensions/VisualEditor/lib/ve/lib/oojs-ui/oojs-ui.js 'touch'
22:10 logmsgbot: krinkle synchronized php-1.23wmf21/resources/startup.js 'touch'
22:10 logmsgbot: krinkle synchronized php-1.23wmf21/resources/oojs-ui/oojs-ui.js 'touch'
21:50 Krinkle: VisualEditor throws uncaught error on load for 1.23wmf21 wikis (bug 63791)
21:15 bd808: Disabled beta update Jenkins job (https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/) so that scap testing can happen in beta.
19:29 ottomata: reinstalling elastic1004
19:19 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'I5501078cee871fb9df03e085547b7a047ef5bd7e'
19:16 logmsgbot: ori synchronized wmf-config/InitialiseSettings-labs.php 'Ia79b1b848: Work around bug 63780 by specifying a siteParamsCallback'
19:15 logmsgbot: ori updated /a/common to Ia79b1b848: Work around bug 63780 by specifying a siteParamsCallback
18:44 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'touch'
18:43 logmsgbot: reedy synchronized database lists files: Enable MediaViewer on mediawikiwiki
18:42 logmsgbot: reedy synchronized docroot and w
18:41 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 wikis to 1.23wmf22
18:37 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikipedias to 1.23wmf21
18:33 logmsgbot: reedy synchronized php-1.23wmf21/extensions/MultimediaViewer
16:55 ottomata: shutting down elastic1003 for reinstall and reformat
16:42 logmsgbot: reedy updated /a/common to Ie72029103: Add/update symlinks
16:40 logmsgbot: reedy Finished scap: testwiki to 1.23wmf22 and build l10n cache (duration: 24m 45s)
16:15 logmsgbot: reedy Started scap: testwiki to 1.23wmf22 and build l10n cache
16:14 logmsgbot: reedy updated /a/common to I2cccebdd7: wikidatawiki back to 1.23wmf21
15:03 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikidatawiki back to 1.23wmf21...
15:00 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Apr 10 15:00:01 UTC 2014 (duration 27m 49s)
14:17 logmsgbot: LocalisationUpdate completed (1.23wmf21) at 2014-04-10 14:17:16+00:00
13:49 logmsgbot: LocalisationUpdate completed (1.23wmf20) at 2014-04-10 13:49:24+00:00
13:31 logmsgbot: reedy Finished scap: l10n cache update for wikidatawiki (duration: 19m 15s)
13:12 logmsgbot: reedy Started scap: l10n cache update for wikidatawiki
12:42 bblack: removed broken pdns_gmetric cronjob on lvs boxes
09:44 logmsgbot: ori synchronized wmf-config/CommonSettings.php 'I107179a27: Move HHVM extension blacklist below extract($globals) so it isn't simply clobbered'
09:42 logmsgbot: ori updated /a/common to I107179a27: Move HHVM extension blacklist below extract($globals) so it isn't simply clobbered
09:39 hashar: Zuul processed its backlog. Had to disconnect/reconnect the labs slaves. There is some weird bug occurring :-(
09:29 hashar: Jenkins: disabling Gearman client in https://integration.wikimedia.org/ci/configure and reenabling it
09:20 hashar: Jenkins unpooling both slave labs using the web interface and killing the Jenkins client running as jenkins-deploy . Will repool so the job can be reregistered properly [[bugzilla:63760|bug 63760]]
09:11 mutante: DNS update - removing ms6
09:04 hashar: Jenkins bunch of jobs are not being triggered properly. Taking traces.
08:55 mutante: ms6 - shutdown -h now
08:42 mutante: forcing Bugzilla logout for all users
08:19 logmsgbot: aude synchronized php-1.23wmf20/extensions/Wikidata
08:09 logmsgbot: aude synchronized php-1.23wmf20/extensions/Wikidata
07:57 logmsgbot: aude rebuilt wikiversions.cdb and synchronized wikiversions files: Rebuild wikiversions and put wikidata on 1.23wmf20
07:53 logmsgbot: aude synchronized wikiversions.json 'Put Wikidata back on 1.23wmf20, due to localisation cache issues'
07:21 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Apr 10 07:21:16 UTC 2014 (duration 7m 21s)
06:46 logmsgbot: LocalisationUpdate completed (1.23wmf21) at 2014-04-10 06:45:59+00:00
06:27 logmsgbot: LocalisationUpdate completed (1.23wmf20) at 2014-04-10 06:27:30+00:00
06:27 logmsgbot: ori synchronized wmf-config/CommonSettings.php 'I20bbe05cc: Avoid using bits on beta-hhvm.wmflabs.org'
06:26 logmsgbot: ori updated /a/common to I20bbe05cc: Avoid using bits on beta-hhvm.wmflabs.org
06:15 ori: Some interface messages are missing on wikidata.org. Started a manual l10nupdate.
04:11 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Apr 10 04:11:47 UTC 2014 (duration 11m 46s)
03:19 logmsgbot: LocalisationUpdate completed (1.23wmf21) at 2014-04-10 03:19:18+00:00
03:01 logmsgbot: ori synchronized multiversion/MWMultiVersion.php 'Ibdbac982b: Update multiversion regexp for *.beta-hhvm.wmflabs.org'
03:01 logmsgbot: ori updated /a/common to Ibdbac982b: Update multiversion regexp for *.beta-hhvm.wmflabs.org
02:22 logmsgbot: LocalisationUpdate completed (1.23wmf20) at 2014-04-10 02:22:11+00:00
01:49 logmsgbot: ori synchronized wmf-config/InitialiseSettings-labs.php 'I697f7e4a6: Use to branch on interpreter'
01:48 logmsgbot: ori synchronized wmf-config/CommonSettings.php 'I697f7e4a6: Use to branch on interpreter'
01:48 logmsgbot: ori updated /a/common to I697f7e4a6: Use '$channel' to branch on interpreter
01:08 K4-713: updated payments to e1d00b61a703
01:06 Krinkle: git-deploy: Deploying integration/slave-scripts If2539c
01:05 Krinkle: Undid local patch to "grunt-lib-phantomjs/phantomjs/main.js" (for bug 63579) in "/srv/deployment/integration/slave-scripts" on gallium
00:20 logmsgbot: yurik synchronized wmf-config/InitialiseSettings.php
00:08 awight: updated crm from e726e42 to e3f2859
00:06 K4-713: updated payments to 70dce8f4bc7

April 9

23:36 logmsgbot: ebernhardson synchronized php-1.23wmf21/extensions/Math/modules/VisualEditor/ve.ui.MWMathInspectorTool.js 'Update Math VE tool to use a command in 1.23wmf21'
23:32 logmsgbot: ebernhardson synchronized wmf-config/CommonSettings.php 'Update Flow cache version'
23:22 logmsgbot: ebernhardson synchronized php-1.23wmf21/extensions/Flow 'Backport fix DB-to-cache pipeline for 1.23wmd21'
23:05 logmsgbot: ebernhardson synchronized wmf-config/InitialiseSettings-labs.php 'Enable math VE plugin on labs'
23:04 Krinkle: Jenkins and Zuul are back up. Queues have not been preserved.
23:01 ^d: gerrit: reloaded bugzilla plugin to force it to log back in
23:00 Krinkle: Restarting Jenkins because I have no clue what is going on and have no time to investigate yet another random clogging of all jobs. Restart ought to fix it.
22:54 Krinkle: Zuul has lots of queued jobs for npm slaves, but neither Jenkins nor integration-slave1001.eqiad.wmflabs and 1002 themselves have anything queued. They're idle, responsive and waiting for jobs.
22:47 Krinkle: Jenkins slaves in labs seem to be down. Zuul is stacking up jobs for hasNpm nodes (integration slaves in labs). Both slaves have 7/7 executors idle.
22:33 hoo: Logged out all Bugzilla users by deleting all session cookie data from mysql
19:15 logmsgbot: csteipp synchronized php-1.23wmf21/extensions/CentralAuth/maintenance
19:10 logmsgbot: yurik synchronized wmf-config/InitialiseSettings.php
17:38 logmsgbot: yurik synchronized wmf-config/InitialiseSettings.php
17:22 manybubbles: regenerating Elasticsearch index from mediawiki for testwiki to soak up geo changes.
16:48 logmsgbot: maxsem synchronized wmf-config/InitialiseSettings.php 'https://gerrit.wikimedia.org/r/124880'
16:41 manybubbles: reindexed testwiki to soak up geo changes
16:37 logmsgbot: maxsem synchronized wmf-config/InitialiseSettings.php 'https://gerrit.wikimedia.org/r/124876'
16:32 logmsgbot: maxsem synchronized php-1.23wmf21/extensions/GeoData
16:28 manybubbles: fiddling with Elasticsearch cluster balancing options trying to get enwiki better balanced
16:17 logmsgbot: aude synchronized php-1.23wmf21/extensions/Wikidata 'Switch Wikidata back to previous version of Wikibase'
15:52 mutante: ms6 - revoke puppet cert, salt key, remove from icinga
15:02 ottomata: stopped puppet on emery to test sqstat on analytics1003
14:48 ottomata: disabling puppet to test sqstat on analytics1003
14:14 RobH: otrs back up, live hacked apache change, now working permanent puppet change (puppet is disabled on iodine at present)
14:02 RobH: yes, otrs is totally ssl borked, robh is working on it
14:00 mutante: adding filippo to ops/wmf LDAP groups
13:58 RobH: updating otrs cert
09:19 logmsgbot: hashar synchronized wmf-config/InitialiseSettings.php '[] = 'musees.cg70.fr'; 124754 [[bugzilla:63449|bug 63449]]'
08:39 hashar: Gerrit Letting JenkinsBot submit changes on apps/android/*
03:33 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Apr 9 03:33:25 UTC 2014 (duration 33m 24s)
02:43 logmsgbot: LocalisationUpdate completed (1.23wmf21) at 2014-04-09 02:43:52+00:00
02:19 logmsgbot: LocalisationUpdate completed (1.23wmf20) at 2014-04-09 02:19:25+00:00
01:25 Krinkle: Bug 63579 is still happening occasionally. Leaving patch on gallium in place for now.
01:09 ori: Debugging uWSGI init scripts on tungsten; expect some Graphite / Gdash flapping.
00:15 ori: graphite webapp 502 caused by uwsgi's init script not restarting the service correctly
00:07 Krinkle: graphite.wikimedia.org (e.g. https://graphite.wikimedia.org/render/?) is serving 502 Bad Gateway, ori is investigating
00:04 Krinkle: To investigate bug 63579, manually patched "grunt-lib-phantomjs/phantomjs/main.js" in "/srv/deployment/integration/slave-scripts" on gallium

April 8

23:34 logmsgbot: mwalker synchronized php-1.23wmf21/extensions/MultimediaViewer/ 'Updating MultimediaViewer for 124510'
23:08 logmsgbot: csteipp synchronized php-1.23wmf21/extensions/CentralAuth/maintenance 'Push maintenance script for token reset'
21:21 logmsgbot: bd808 Purged l10n cache for 1.23wmf19
21:20 logmsgbot: bd808 Purged l10n cache for 1.23wmf18
21:12 logmsgbot: bd808 rebuilt wikiversions.cdb and synchronized wikiversions files: group1 to 1.23wmf21
19:58 manybubbles: finished upgrading to Elasticsearch 1.1.0. The process went well with no issues other then some knocking out search in labs 3 times for 30 seconds a piece. And logging lots of nasty warnings to irc. I've started to the process to fix search in labs so it won't happen again.
19:56 manybubbles: upgraded all elasticsearch servers except elastic1008. that is coming now.
18:45 logmsgbot: ori synchronized wmf-config/InitialiseSettings.php 'I4b18e4ce8: Change wgServer and wgCanonicalServer for arbcom wikis'
18:45 logmsgbot: ori updated /a/common to I4b18e4ce8: Change wgServer and wgCanonicalServer for arbcom wikis
18:13 logmsgbot: bd808 Finished scap: group0 wikis to 1.23wmf21 (with patch for bug 63659) (duration: 03m 18s)
18:10 logmsgbot: bd808 Started scap: group0 wikis to 1.23wmf21 (with patch for bug 63659)
18:01 logmsgbot: hoo synchronized wmf-config/InitialiseSettings.php 'Touch to clear config. cache'
17:56 hoo: changed the Wikidata wb_changes_dispatch position of all wikiquote wikis to 118158153
17:37 logmsgbot: hoo synchronized php-1.23wmf20/extensions/Wikidata/extensions/Wikibase/lib/resources/wikibase.Site.js 'touch'
17:37 logmsgbot: aude synchronized wmf-config/Wikibase.php 'bump wgCacheEpoch for wikidata after enabling wikiquote site links'
17:35 ottomata: restarted gmetad on nickel to fix ganglia
17:29 logmsgbot: aude synchronized wikidataclient.dblist 'Enable Wikibase on Wikiquote'
17:29 logmsgbot: aude synchronized wmf-config 'config changes to enable Wikibase on Wikiquote'
17:22 logmsgbot: aude synchronized wmf-config/CirrusSearch-labs.php 'config change for beta, to enable highlighting'
17:16 manybubbles: finished upgrading elastic1001-1006. starting on 1007. yay progress.
17:03 logmsgbot: aude synchronized php-1.23wmf20/extensions/Wikidata 'Update Wikidata build, to allow populating sites table on wikiquote'
16:31 aude: added sites and site_identifiers core tables on wikiquote
16:28 hashar: Jenkins: killed jenkins-slave java process on gallium and repooled gallium slave. It was no more registered in Zuul :-/
14:32 manybubbles: no harm done, just lost time
14:32 manybubbles: woops, just restarted elastic1002. silly me
14:31 manybubbles: upgrading elastic1001
13:54 manybubbles: they'll pick it up during the rolling restart today to upgrade to 1.1.0
13:53 manybubbles: synced first Elasticsearch plugin to production Elasticsearch servers
13:46 RobH: upgraded libssl on holmium
13:39 Jeff_Green: update & reboot tellurium
13:39 RobH: replacing the blog cert, if holmium crashes I didn't do it correctly.
13:37 mutante: restarting gitblit
12:56 logmsgbot: reedy updated /a/common to Id15ddc665: Revert "Group0 wikis to 1.23wmf21"
10:21 Jeff_Green: update & reboot barium
10:15 Jeff_Green: update & reboot samarium
07:47 _joe|away: restarted nginx on cp1044 and cp1043
05:47 apergos: shot many old apache processes running as stats user from 2013, on stat1001 (restarting apache runs it as www-data user)
05:39 apergos: restarted apache on fenari magnesium yterrbium antimony
05:31 _joe_: upgraded openssl on cp10* and cp30* servers as well
04:46 Tim: on dataset1001: upgraded libssl and restarted lighttpd
04:43 Tim: restarted apache on the above list, failed on labs-ns1, virt1000, ytterbium
04:41 Tim: upgraded libssl on zirconium.wikimedia.org,neon.wikimedia.org,netmon1001.wikimedia.org,iodine.wikimedia.org,ytterbium.wikimedia.org,gerrit.wikimedia.org,virt1000.wikimedia.org,labs-ns1.wikimedia.org,stat1001.wikimedia.org
04:38 Ryan_Lane: upgrading libssl on virt0
04:37 Ryan_Lane: upgrading libssl on virt1000
04:15 Tim: also upgraded libssl on cp4001-4019. Restarted nginx on these servers and also the previous list.
04:03 Tim: upgrading libssl on ssl1001,ssl1002,ssl1003,ssl1004,ssl1005,ssl1006,ssl1007,ssl1008,ssl1009,ssl3001.esams.wikimedia.org,ssl3002.esams.wikimedia.org,ssl3003.esams.wikimedia.org
03:11 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Apr 8 03:11:04 UTC 2014 (duration 11m 3s)
02:34 logmsgbot: LocalisationUpdate completed (1.23wmf21) at 2014-04-08 02:34:56+00:00
02:16 logmsgbot: LocalisationUpdate completed (1.23wmf20) at 2014-04-08 02:15:58+00:00
01:06 logmsgbot: bd808 Finished scap: revert group0 to 1.23wmf21 (testwiki still on 1.23wmf21) (duration: 09m 54s)
00:56 logmsgbot: bd808 Started scap: revert group0 to 1.23wmf21 (testwiki still on 1.23wmf21)
00:54 logmsgbot: bd808 scap aborted: group0 to 1.23wmf21 (testing python change for mwversionsinuse) (again) (duration: 00m 25s)
00:54 logmsgbot: bd808 Started scap: group0 to 1.23wmf21 (testing python change for mwversionsinuse) (again)
00:53 logmsgbot: bd808 scap aborted: group0 to 1.23wmf21 (testing python change for mwversionsinuse) (duration: 02m 57s)
00:50 logmsgbot: bd808 Started scap: group0 to 1.23wmf21 (testing python change for mwversionsinuse)
00:25 logmsgbot: catrope synchronized php-1.23wmf21/extensions/VisualEditor 'it helps if you run git submodule update first'
00:24 logmsgbot: catrope synchronized php-1.23wmf20/extensions/VisualEditor 'it helps if you run git submodule update first'

April 7

23:58 logmsgbot: catrope synchronized php-1.23wmf20/extensions/VisualEditor/ 'VisualEditor bug fixes'
23:57 logmsgbot: catrope synchronized php-1.23wmf20/skins/vector/variables.less 'Remove troublesome fonts from font stack'
23:50 logmsgbot: catrope synchronized php-1.23wmf21/resources/oojs-ui/ 'Update OOJS-UI for bug fixes'
23:50 logmsgbot: catrope synchronized php-1.23wmf21/extensions/VisualEditor/ 'VisualEditor bug fixes'
23:49 logmsgbot: catrope synchronized php-1.23wmf21/skins/vector/variables.less 'Remove troublesome fonts from font stack'
23:45 logmsgbot: catrope synchronized php-1.23wmf21/extensions/VisualEditor/ 'VisualEditor bug fixes'
23:42 logmsgbot: catrope synchronized php-1.23wmf21/resources/oojs-ui/ 'Update OOJS-UI for bug fixes'
23:42 logmsgbot: catrope synchronized php-1.23wmf21/skins/vector/variables.less 'Remove troublesome fonts from font stack'
23:17 logmsgbot: catrope synchronized wmf-config/InitialiseSettings.php 'SWAT changes: other projects bar on frwikisource, import sources'
23:06 logmsgbot: bd808 Finished scap: test2wiki to 1.23wmf20 (duration: 16m 48s)
22:49 logmsgbot: bd808 Started scap: test2wiki to 1.23wmf20
22:38 logmsgbot: bd808 Finished scap: test2wiki to 1.23wmf21 (duration: 12m 07s)
22:26 logmsgbot: bd808 Started scap: test2wiki to 1.23wmf21
22:12 logmsgbot: bd808 Finished scap: Testing 1.23wmf21 l10n changes (duration: 01m 31s)
22:10 logmsgbot: bd808 Started scap: Testing 1.23wmf21 l10n changes
22:07 logmsgbot: bd808 Finished scap: Testing 1.23wmf21 l10n changes (duration: 03m 49s)
22:04 logmsgbot: bd808 Started scap: Testing 1.23wmf21 l10n changes
21:13 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Apr 7 21:13:14 UTC 2014 (duration 1m 59s)
20:40 logmsgbot: LocalisationUpdate completed (1.23wmf21) at 2014-04-07 20:40:02+00:00
20:23 logmsgbot: LocalisationUpdate completed (1.23wmf20) at 2014-04-07 20:23:14+00:00
19:08 ottomata: temporatily disabling puppet on analytics 1009, 1010, 1019, 1020 to bring up new journalnodes
18:39 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Apr 7 18:39:21 UTC 2014 (duration 15m 30s)
18:23 logmsgbot: aaron synchronized wmf-config/PoolCounterSettings-eqiad.php 'Added "downloadtiff" pool counter config'
18:13 AaronSchulz: shwiki queue finished emptying out in staggered loop on terbium
18:03 logmsgbot: LocalisationUpdate completed (1.23wmf21) at 2014-04-07 18:03:48+00:00
17:36 logmsgbot: LocalisationUpdate completed (1.23wmf20) at 2014-04-07 17:36:02+00:00
17:24 bd808: Manually running l10nupdate with new --verbose flag to capture log output
14:34 MaxSem: Rebuilding GeoData index
14:09 hashar: Jenkins cleared swap on gallium (swapoff -a && swapon -a). Makes ganglia graph nicer :D
13:15 apergos: reenabled puppet on dataset2, testing done
12:23 apergos: disabled puppet on dataset2, testing
11:19 logmsgbot: reedy Finished scap: because we're scappy... (rebuilding l10n cache for 1.23wmf21 (duration: 18m 04s)
11:01 logmsgbot: reedy Started scap: because we're scappy... (rebuilding l10n cache for 1.23wmf21
10:45 hashar: integration Getting PHP Composer installed on labs slaves. 124305
09:21 paravoid: reactivating peerings with HE, issues reportedly resolved
09:04 hashar: restarted Zuul
08:54 hashar: gallium killed console-kit-daemon process which was eating a lot of memory
08:42 hashar: Restarting Jenkins, out of Java heap space. Something is leaking memory
08:41 hashar: Jenkins being broken for some reason AGAIN !
05:04 ori: Zuul is stuck: <http://i.imgur.com/o5ghCam.jpg> (617kb image)
02:56 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Apr 7 02:56:12 UTC 2014 (duration 56m 11s)
02:20 logmsgbot: LocalisationUpdate completed (1.23wmf21) at 2014-04-07 02:20:10+00:00
02:13 logmsgbot: LocalisationUpdate completed (1.23wmf20) at 2014-04-07 02:13:42+00:00

April 6

02:53 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Apr 6 02:53:13 UTC 2014 (duration 53m 12s)
02:18 logmsgbot: LocalisationUpdate completed (1.23wmf21) at 2014-04-06 02:18:43+00:00
02:13 logmsgbot: LocalisationUpdate completed (1.23wmf20) at 2014-04-06 02:12:57+00:00
01:32 jamesofu_: sugar down for move to labs

April 5

04:15 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Apr 5 04:15:00 UTC 2014 (duration 50m 39s)
03:56 logmsgbot: andrew rebuilt wikiversions.cdb and synchronized wikiversions files: Revert mw.org, test2wiki and testwikidatawiki to 1.23wmf20 due to localisation issue
03:51 Andrew: Reverting mw.org, test2 and test.wikidata back to 1.23wmf20
03:41 logmsgbot: LocalisationUpdate completed (1.23wmf21) at 2014-04-05 03:41:36+00:00
03:36 logmsgbot: LocalisationUpdate completed (1.23wmf20) at 2014-04-05 03:36:04+00:00
03:23 Andrew: Actually, going to rerun l10nupdate first just to check.
03:22 Andrew: Going to revert deployment of 1.23wmf21 again - still broken
03:08 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Apr 5 03:08:33 UTC 2014 (duration 8m 32s)
02:34 logmsgbot: LocalisationUpdate completed (1.23wmf21) at 2014-04-05 02:34:54+00:00
02:14 logmsgbot: LocalisationUpdate completed (1.23wmf20) at 2014-04-05 02:14:07+00:00

April 4

21:34 logmsgbot: bd808 Finished scap: Group0 to 1.23wmf21 (again) (duration: 14m 35s)
21:19 logmsgbot: bd808 Started scap: Group0 to 1.23wmf21 (again)
19:28 hashar: Jenkins: unpooled slave agent on lanthanum, killed it the java agent on it and repooled it.
19:22 hashar: Jenkins is processing jobs again. Queue unchanged so it will resume everything
19:16 hashar: restarting Jenkins
19:07 hashar: Jenkins un pooling gallium slave
19:05 hashar: Zuul / Jenkins stalled again.
18:43 csteipp: redeployed updated patch for bug63251 to fix a reported bug
16:10 _joe_: restarting gitlbit, for the last time today
15:06 _joe_: restarting gitblit as it has eaten up all of its ram again and is trashing cpu
12:32 mutante: hume - shutting down
12:06 mutante: hume - disable puppet/salt/monitoring
11:13 mutante: restarting gitblit with new option to use incremental GC in an attempt to fix timeouts caused by GC eating CPU
08:07 paravoid: deactivating cr1-eqiad<->HE peerings, translantic par2<->ash1 is congested
07:25 mutante: restarting gitblit
05:45 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Apr 4 05:45:07 UTC 2014 (duration 18m 25s)
04:56 logmsgbot: LocalisationUpdate completed (1.23wmf21) at 2014-04-04 04:56:06+00:00
04:45 logmsgbot: LocalisationUpdate completed (1.23wmf20) at 2014-04-04 04:45:01+00:00
04:20 logmsgbot: demon rebuilt wikiversions.cdb and synchronized wikiversions files: unbreak test2.wp and test.wikidata as well
04:17 logmsgbot: demon rebuilt wikiversions.cdb and synchronized wikiversions files: mw.org back to 1.23wmf20
03:43 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Apr 4 03:43:03 UTC 2014 (duration 43m 2s)
03:28 ori: Interface messages are missing on group0 / 1.23wmf21 wikis (mediawikiwiki, testwiki, test2wiki, and testwikidata)
02:50 logmsgbot: LocalisationUpdate completed (1.23wmf21) at 2014-04-04 02:50:26+00:00
02:24 logmsgbot: LocalisationUpdate completed (1.23wmf20) at 2014-04-04 02:24:51+00:00
01:08 logmsgbot: krinkle synchronized php-1.23wmf21/resources 'I6e93d9ab0e4a926c09c'

April 3

22:00 logmsgbot: demon synchronized wmf-config/CirrusSearch-production.php 'lowering cache time, for testing'
21:55 logmsgbot: demon updated /a/common/php-1.23wmf20 to Ic853ebff4: Cherry-pick I550eb4b0a8fa18344e8b0de3ec85d61c2122ffb8
21:54 logmsgbot: demon synchronized php-1.23wmf20/extensions/CirrusSearch 'Cirrus back to master again'
21:50 logmsgbot: ori synchronized multiversion/updateBitsBranchPointers 'updateBitsBranchPointers: get rid of 'static-stable' branch link'
21:50 logmsgbot: ori updated /a/common to Ic1602c045: updateBitsBranchPointers: get rid of 'static-stable' branch link
21:46 logmsgbot: demon synchronized php-1.23wmf20/extensions/CirrusSearch 'Rolling back to 1.23wmf20 branch point from master'
21:38 logmsgbot: demon synchronized php-1.23wmf20/extensions/CirrusSearch 'Updating Cirrus to master'
21:33 logmsgbot: demon synchronized wmf-config/CirrusSearch-production.php 'italian wikis getting interwiki search. they're my favorite beta testers'
19:23 logmsgbot: reedy synchronized docroot and w
19:21 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 wikis to 1.23wmf21
19:17 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias actually to 1.23wmf20
19:15 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias to 1.23wmf20
19:09 logmsgbot: reedy Finished scap: testwiki to 1.23wmf21 and build l10n cache (duration: 38m 23s)
18:30 logmsgbot: reedy Started scap: testwiki to 1.23wmf21 and build l10n cache
18:23 logmsgbot: reedy updated /a/common to I835c2b1d5: Depool. See RT 7191.
11:10 paravoid: IPv4 eqiad<->esams private link also elevated by ~15ms but no packet loss observed
11:09 paravoid: affects both IPv6 transit at esams (slowdowns) as well as IPv6 eqiad<->esams
11:08 paravoid: deactivating cr1-esams<->HE peering, latency > 160ms, over at 200ms (congestion?); back to 84ms now;
10:51 akosiaris: temporarily stopped squid on brewster
10:26 hashar: Jenkins job mediawiki-core-phpunit-hhvm is back around thanks to 123573
06:28 paravoid: powercycling ms-be1003, unresponsive, no console output
04:43 logmsgbot: springle synchronized wmf-config/db-eqiad.php 'return upgraded DB slaves to normal load'
04:11 logmsgbot: springle synchronized wmf-config/db-eqiad.php 's6 repool db1015, warm up'
04:04 logmsgbot: springle synchronized wmf-config/db-eqiad.php 's6 depool db1015 for upgrade'
04:03 logmsgbot: springle synchronized wmf-config/db-eqiad.php 's5 repool db1037, warm up'
03:53 logmsgbot: springle synchronized wmf-config/db-eqiad.php 's5 depool db1037 for upgrade'
03:53 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Apr 3 03:53:18 UTC 2014 (duration 53m 16s)
03:34 springle: db1020 raid controller dimm ecc errors
03:14 logmsgbot: springle synchronized wmf-config/db-eqiad.php 's4 depool db1020 for upgrade'
03:12 logmsgbot: springle synchronized wmf-config/db-eqiad.php 's3 repool db1019, warm up'
02:57 logmsgbot: springle synchronized wmf-config/db-eqiad.php 's3 depool db1019 for upgrade'
02:56 logmsgbot: springle synchronized wmf-config/db-eqiad.php 's2 repool db1060, warm up'
02:48 logmsgbot: LocalisationUpdate completed (1.23wmf20) at 2014-04-03 02:48:01+00:00
02:47 logmsgbot: springle synchronized wmf-config/db-eqiad.php 's2 depool db1060 for upgrade'
02:45 logmsgbot: springle synchronized wmf-config/db-eqiad.php 's1 repool db1061, warm up'
02:35 logmsgbot: springle synchronized wmf-config/db-eqiad.php 's1 depool db1061 for upgrade'
02:24 logmsgbot: LocalisationUpdate completed (1.23wmf19) at 2014-04-03 02:24:07+00:00

April 2

23:47 logmsgbot: aaron synchronized wmf-config/CommonSettings.php 'Bumped wgJobBackoffThrottling for htmlCacheUpdate to 15'
23:47 mwalker: ... deploy was for mobile frontend 123454
23:46 logmsgbot: mwalker synchronized php-1.23wmf20/extensions/MobileFrontend 'SWAT deploy for MaxSem'
20:23 subbu: deployed Parsoid 33471172 with deploy repo sha 5c620e54
19:03 logmsgbot: ori synchronized php-1.23wmf20/extensions/WikimediaEvents 'Update WikimediaEvents for I7fdaa5524: Use simple random sampling to log deprecated usage at 1:100'
19:03 logmsgbot: ori synchronized php-1.23wmf19/extensions/WikimediaEvents 'Update WikimediaEvents for I7fdaa5524: Use simple random sampling to log deprecated usage at 1:100'
17:00 andrewbogott: fixed updating crons on wikitech-status, I think. Time will tell...
16:19 logmsgbot: manybubbles synchronized wmf-config/InitialiseSettings.php 'Lower timeout on prefix searches and make the cirrus.dblist sync I just did take effect.'
16:19 logmsgbot: manybubbles synchronized cirrus.dblist 'Cirrus as primary for most of group1'
16:14 akosiaris: banned tools-exec-03.eqiad.wmflabs. using manual iptables on ytterbium
15:20 ottomata: stopping puppet on stat1
14:27 hashar: Jenkins applying label contintLabsSlave on slaves in labs used for ci (integration-slave1001 and 1002)
14:15 hashar: Jenkins deleting pmtpa slaves (they all have been shutdown and jobs got deleted)
14:00 manybubbles: tried restarting some lsearchd services (carefully) to clear out some crashing when searching for a particular query term. It caused pool queue full errors.... serves me right for trying?
11:20 mutante: running CheckUser/maintenance/purgeOldData.php on all wikis
09:42 akosiaris: rsynced brewster /srv to carbon
09:34 mutante: restarting gitblit on antimony
09:14 mutante: DNS update - removing capella
09:09 mutante: DNS update - removing ms10
05:31 logmsgbot: springle synchronized wmf-config/db-eqiad.php 'normal loads for all upraded slaves'
04:53 logmsgbot: springle synchronized wmf-config/db-eqiad.php 's1 repool db1062, warm up'
04:45 logmsgbot: springle synchronized wmf-config/db-eqiad.php 's1 depool db1062 for upgrade'
04:42 logmsgbot: springle synchronized wmf-config/db-eqiad.php 's7 repool db1039, warm up'
04:27 logmsgbot: springle synchronized wmf-config/db-eqiad.php 's7 depool db1039 for upgrade'
03:56 logmsgbot: springle synchronized wmf-config/db-eqiad.php 's6 repool db1006, warm up'
03:48 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Apr 2 03:48:31 UTC 2014 (duration 48m 30s)
03:46 logmsgbot: springle synchronized wmf-config/db-eqiad.php 's6 depool db1006 for upgrade'
03:43 logmsgbot: springle synchronized wmf-config/db-eqiad.php 's5 repool db1045, warm up'
03:27 logmsgbot: springle synchronized wmf-config/db-eqiad.php 's5 depool db1045 for upgrade'
03:21 logmsgbot: springle synchronized wmf-config/db-eqiad.php 's4 repool db1059, warm up'
03:07 logmsgbot: springle synchronized wmf-config/db-eqiad.php 's4 depool db1059 for upgrade'
03:04 logmsgbot: springle synchronized wmf-config/db-eqiad.php 's2 repool db1063, warm up'
02:55 logmsgbot: springle synchronized wmf-config/db-eqiad.php 's2 depool db1063 for upgrade'
02:52 logmsgbot: LocalisationUpdate completed (1.23wmf20) at 2014-04-02 02:52:48+00:00
02:29 logmsgbot: LocalisationUpdate completed (1.23wmf19) at 2014-04-02 02:29:18+00:00
02:22 logmsgbot: springle synchronized wmf-config/db-eqiad.php 's3 repool db1027, warm up'
02:03 logmsgbot: springle synchronized wmf-config/db-eqiad.php 's3 depool db1027 for upgrade'
01:16 logmsgbot: ori synchronized php-1.23wmf20/extensions/WikimediaEvents 'Undeployed change from earlier SWAT deploy'
01:16 logmsgbot: ori synchronized php-1.23wmf19/extensions/WikimediaEvents 'Undeployed change from earlier SWAT deploy'
01:05 logmsgbot: ori synchronized php-1.23wmf19/extensions/WikimediaEvents
01:04 logmsgbot: ori synchronized php-1.23wmf19/extensions/EventLogging
01:02 logmsgbot: ori synchronized php-1.23wmf20/extensions/WikimediaEvents
01:02 logmsgbot: ori synchronized php-1.23wmf20/extensions/EventLogging

April 1

23:48 logmsgbot: ebernhardson synchronized php-1.23wmf19/extensions/WikimediaEvents/ 'Update WikimediaEvents to master'
23:48 logmsgbot: ebernhardson synchronized php-1.23wmf19/extensions/EventLogging/ 'Update EventLogging to master'
23:47 logmsgbot: ebernhardson synchronized php-1.23wmf20/extensions/EventLogging/ 'Update EventLogging to master'
23:46 logmsgbot: ebernhardson synchronized php-1.23wmf20/extensions/WikimediaEvents/ 'Update WikimediaEvents to master'
23:32 logmsgbot: ebernhardson synchronized docroot and w
21:42 hashar: Ganglia in labs is more or less back in activity: http://ganglia.wmflabs.org/ No clue what it is graphing though
21:27 hashar: jenkins killed stuck build (5 hours+) of beta-update-databases-eqiad . Might have been blocking Jenkins build queue
19:09 Reedy: ori gracefulled mw1018, mw1050, mw1061, mw1070, mw1139, mw1179
18:45 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: zerowiki to 1.23wmf20
18:43 logmsgbot: reedy updated /a/common to If887effe5: Add zerowiki
18:43 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'touch'
18:42 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Add zerowiki
18:41 logmsgbot: reedy synchronized database lists files:
18:37 logmsgbot: reedy synchronized docroot and w
18:36 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: non wikipedias to 1.23wmf20
18:28 mutante: ms10 - shut down
18:19 mutante: ms10 - disable puppet, revoke puppet cert,salt key,icinga..
18:03 logmsgbot: ori synchronized php-1.23wmf20/includes/profiler/ProfilerSimple.php 'Iad91f1d12: Send profiled items under the correct name'
18:02 logmsgbot: ori synchronized php-1.23wmf19/includes/profiler/ProfilerSimple.php 'Iad91f1d12: Send profiled items under the correct name'
17:34 mutante: logging to eqiad wikitech after Andrew switched over
16:05 andrewbogott: switching wikitech to read-only, migrating to eqiad
15:06 logmsgbot: reedy updated /a/common to If3ca3d486: beta: adjust $wgCaptchaDirectory
15:01 hashar: Gerrit super slow again :-(
14:46 mutante: added oblivion to root-auth-keys
14:17 mutante: welcome new shell user oblivion
14:04 hashar: Gerrit flushed a few caches related to user accounts / LDAP
13:43 mutante: adding oblivion to ops and wmf LDAP groups
08:44 mutante: solr1/2 - revoke puppet certs
08:43 mutante: solr3 - delete salt key, puppet cert
03:11 logmsgbot: springle synchronized wmf-config/db-eqiad.php 's7 db1034 full steam'
02:02 logmsgbot: LocalisationUpdate failed: git pull of extensions failed
01:58 springle: killed research queries on db1047. email me
01:35 springle: restarted sanitarium s3 instance for additional private wikis
00:59 logmsgbot: springle synchronized wmf-config/db-eqiad.php 's7 repool db1034, warm up'