02:03 awight: updated payments from 78b72063e4e0cc76b7e168be1e626d5e10e34d4a to 62c81d4574e5e994ff8f3cac7115eff335bd5265
00:52 bd808: restarted elasticsearch on logstash1001
00:49 awight: updated payments from e81f473acc5b31b49dd27714c40f9b71c3462e26 to 78b72063e4e0cc76b7e168be1e626d5e10e34d4a
00:42 bd808: log2udp events still not making it into logstash; possibly related to earlier elasticsearch cluster issues; I don't want to restart elasticsearch on logstash1001 while the cluster is still recovering form that.
00:33 bd808: restarted logstash on logstash1001; log2udp events not being recorded in elasticsearch
December 30
21:52 bd808: restarted elasticsearch on logstash1002; it had dropped from the cluster
19:06 paravoid: manually stopping acct on neon and setting /etc/default/acct ACCT_ENABLE to 0
16:38 godog: killing uwsgi on tunsten, blew memory
14:46 Nemo_bis: morebots is being rude today
14:36 logmsgbot: hoo Synchronized wmf-config/CommonSettings.php: Enable unregistered users editing on it.m.wikipedia.org after Dec 31 (duration: 00m 06s)
December 29
20:19 awight: payments updated from ce7fb9af37c4bba2a84668387b61729df4f9723c to e81f473acc5b31b49dd27714c40f9b71c3462e26
10:35 godog: reboot ms-be2011, stuck while removing a LD, no console
December 27
23:33 paravoid: restarting puppetmasters
20:29 gwicke: dropped old keyspaces titan{,2,3} on xenon to free space for titan4
19:53 ori: gallium: restarted jenkins
16:19 Reedy: jenkins started again...
16:17 Reedy: jenkins killed
16:12 Reedy: attempting to kill jenkins
16:11 Reedy: jenkins is hung with high cpu/memory usage
12:55 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: bump up s1 api load sent to db1066 (duration: 00m 06s)
12:11 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: repool db1066 and db1028, warm up (duration: 00m 06s)
05:49 ori: cerium disk space critical, so moved /mnt/data/cassandra/java_{1418354329,1418533386,1418537719}.hprof to /tmp/hprof_files, freeing up ~17G of space.
December 25
20:44 _joe_: restarting hhvm on mw1239, stuck in HPHP::is_valid_var_name probably after trying to call ini_set
00:32 logmsgbot: hoo Synchronized wmf-config/Bug54847.php: Fix for invalid hashes (this prevented some people from logging in) (duration: 00m 05s)
00:26 logmsgbot: spage Synchronized php-1.25wmf12/extensions/PageTriage/modules/ext.pageTriage.views.toolbar/ext.pageTriage.delete.js: Unbreak page curation on enwiki for Xmas (duration: 00m 05s)
20:45 qchris: restarted webperf service statsd-mw-js-deprecate on hafnium. It seems it did not send metrics to statsd after an EventLogging restart.
02:59 logmsgbot: mattflaschen Synchronized php-1.25wmf13/resources/src/jquery.tipsy/jquery.tipsy.js: Fix "live" deprecated live mode of jQuery tipsy (duration: 00m 05s)
02:59 logmsgbot: mattflaschen Synchronized php-1.25wmf13/resources/src/jquery.tipsy/jquery.tipsy.js: Fix "live" deprecated live mode of jQuery tipsy (duration: 00m 05s)
02:59 logmsgbot: mattflaschen Synchronized php-1.25wmf12/resources/src/jquery.tipsy/jquery.tipsy.js: Fix "live" deprecated live mode of jQuery tipsy (duration: 00m 05s)
02:57 logmsgbot: mattflaschen Synchronized php-1.25wmf13/extensions/PageTriage/modules/ext.pageTriage.views.toolbar/: Fix to PageTriage not to use jQuery live (duration: 00m 07s)
02:57 logmsgbot: mattflaschen Synchronized php-1.25wmf12/extensions/PageTriage/modules/ext.pageTriage.views.toolbar/: Fix to PageTriage not to use jQuery live (duration: 00m 05s)
01:18 awight: payments rolled back to 3dde7be76284aa37b74038dfb4473671999dfcff
00:57 awight: payments updated from 3dde7be76284aa37b74038dfb4473671999dfcff to ce7fb9af37c4bba2a84668387b61729df4f9723c
23:39 awight: payments rolled back ab93b636fae7bcb38a155c019ad102f3b071918c --> 3dde7be76284aa37b74038dfb4473671999dfcff
23:28 awight: payments updated from 3dde7be76284aa37b74038dfb4473671999dfcff to ab93b636fae7bcb38a155c019ad102f3b071918c
23:23 awight: rollback payments to 3dde7be76284aa37b74038dfb4473671999dfcff
23:18 awight: updated payments from 3dde7be76284aa37b74038dfb4473671999dfcff to ab93b636fae7bcb38a155c019ad102f3b071918c
21:27 awight: update crm from ae7b2381667dd65d68812c58f61e3ea66fa9fa6f to 80241fd2a43f03796b416d728661470f875a590a
17:54 hoo: Manually transferred the email from enwiki account "Hob Gadling" to the centralauth account of the same name (after a partially failed account creation).
13:54 akosiaris: purged unpuppetized rrdcached from hafnium. It was segfaulting when started via the init script, which led to the package being unconfigured which led to dpkg alerts on icinga
13:42 _joe|lunch: restarting hhvm on a few servers
13:11 _joe_: restarted hhvm on mw1242, stuck in getrusage()
13:03 _joe_: restarted hhvm on mw1191, load at 200
13:00 paravoid: salt-cleaning up /etc/sudoers.d/50_* (old naming scheme)
12:04 godog: upload carbon-c-relay 0.36+git20141218-1 to trusty-wikimedia
09:13 hashar: enabled MediaWiki core 'structure' PHPUnit tests for all extensions. Will require folks to fix their incorrect AutoLoader and RessourceLoader entries. 180496bug T78798
06:23 _joe|justawake: restarted the puppetmaster on palladium
01:42 logmsgbot: ori Synchronized php-1.25wmf12/resources/src/mediawiki.action/mediawiki.action.edit.stash.js: Ibb29a825c: mediawiki.action.edit.stash: set timeout to 4 seconds (duration: 00m 05s)
01:31 awight: update crm from f1e558592ee98ff8fc84d19ff2c0435619e11242 to ae7b2381667dd65d68812c58f61e3ea66fa9fa6f
23:34 hashar: Restarted Jenkins and Zuul again to have a clean start while I am crashing to bed.
23:22 logmsgbot: demon Synchronized wmf-config/StartProfiler.php: xhprof on all hhvm hosts in eqiad (duration: 00m 05s)
22:46 hashar: restarting Jenkins
21:45 hashar: killing Jenkins
21:41 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.25wmf13
21:40 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias to 1.25wmf12
21:38 logmsgbot: reedy Finished scap: testwiki to 1.25wmf13 and build l10n cache (duration: 12m 26s)
21:25 logmsgbot: reedy Started scap: testwiki to 1.25wmf13 and build l10n cache
21:24 Reedy: @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @ for mw1152
20:36 hashar: Jenkins/Zuul had some deadlock. Disconnected/reconnected slaves but that did not fix it. Finally had to disconnect/reconnect thegearman client in Jenkins and it is processing again.
20:36 logmsgbot: reedy Started scap: testwiki to 1.25wmf13 and build l10n cache
20:12 hashar: Jenkins some slaves are no more properly registered. Unpooling / Repooling them
16:42 logmsgbot: demon Synchronized php-1.25wmf12/extensions/TextExtracts/: (no message) (duration: 00m 05s)
16:27 logmsgbot: demon Synchronized wmf-config/CirrusSearch-labs.php: for completeness (duration: 00m 05s)
16:20 ^d: mw1190: manually ran sync-common since it was yelling about my key earlier
16:14 logmsgbot: demon Synchronized php-1.25wmf12/includes/specials/SpecialSearch.php: (no message) (duration: 00m 06s)
16:10 logmsgbot: demon Synchronized php-1.25wmf12/extensions/Wikidata/: (no message) (duration: 00m 12s)
14:52 akosiaris: uploaded apertium-nno-nob_1.0.0+svn~57977-1 to apt.wikimedia.org
14:43 anomie: Merged and fetched gerrit:180477, so undeployed bad extension changes from gerrit:180229 are no longer a danger
13:47 akosiaris: uploaded apertium-nob_0.1.0+svn~58076-1 and apertium-nno_0.1.0+svn~58076-1 to apt.wikimedia.org
11:59 _joe_: removing some core dumps from appservers, so that we don't run out of space by tomorrow
02:42 logmsgbot: ori Synchronized php-1.25wmf12/includes/parser/MWTidy.php: I4909e5e20: use stream_select() to get external tidy stdout/stderr (uncommitted; pending review) (duration: 00m 33s)
16:21 logmsgbot: marktraceur Synchronized php-1.25wmf11/extensions/MultimediaViewer/: [SWAT] [wmf11] - Track the most recent upload time for performance events (Media Viewer) (duration: 00m 05s)
16:12 logmsgbot: marktraceur Synchronized php-1.25wmf12/extensions/Wikidata/: [SWAT] [wmf12] - Update test.wikidata (fixes/polish for changes to the site link section, and performance improvements for page views). (duration: 00m 24s)
15:31 godog: upload diamond 3.5-3 to trusty-wikimedia
14:01 godog: reinstall python-twisted-bin python-twisted-core python-twisted-web on labmon1001
14:00 robh: zinc removed from icinga, system is now shutdown for reclaim per RT8939
13:50 robh: reclaiming zinc to spares, stopped puppet agent
13:13 akosiaris: uploaded hfst_3.8.1~r4088-1 to apt.wikimedia.org (trusty)
22:06 logmsgbot: ori Synchronized wmf-config/StartProfiler.php: (no message) (duration: 00m 06s)
19:47 ottomata1: initiating kafka preferred-replica-election to bring analytics1021 back in to leadership :/ need to figure this out, or replace this node soon.
18:15 YuviPanda: ran sudo logrotate -f /etc/logrotate.d/dumpwikidatajson on snapshot1003 forhoo
18:13 logmsgbot: ori Synchronized wmf-config/StartProfiler.php: (no message) (duration: 00m 08s)
16:42 akosiaris: uploaded apertium-sv-da, apertium-en-ca to apt.wikimedia.org
15:13 hashar: Zuul Reverting Zuul back to wmf-deploy-20141030-4 . I previously reverted it to another change which was wrong.
14:50 hashar: upgrading python-statsd on Zuul server and restarting service.
14:37 godog: upload python-statsd 3.0.1-1 to precise-wikimedia
14:13 godog: upload python-statsd 3.0.1-1 to trusty-wikimedia
11:43 YuviPanda: force puppet run on all labs hosts via salt
09:33 ori: restarted mwprof on tungsten
09:20 logmsgbot: ori Synchronized wmf-config/StartProfiler.php: I63864cc79: xenon log: collate stack samples and fold into single lines (duration: 00m 06s)
18:41 godog: restart profiler-to-carbon on tungsten to pick up changes, including hhvm-profiler-to-carbon
18:20 logmsgbot: ori Synchronized php-1.25wmf12/resources/src/mediawiki.action/mediawiki.action.edit.stash.js: Ib2de3f15: Stash edit when user idles (duration: 00m 05s)
18:16 ejegg: updated dash from 08b078acf904d563030ff7a37b2af8df88387e29 to 6631a97e5e3e688bc0f4d2a1f6f5d97744dba0f4
17:41 ottomata: starting trusty upgrade of analytics1019
17:18 paravoid: restarting apache on strontium
17:06 _joe_: restarting HHVM on mw1237, stuck in HPHP::StatCache::refresh
16:59 godog: restarted gmond on ms-fe1001, all swift machines under this aggregator were showing offline
16:50 logmsgbot: marktraceur Synchronized wmf-config/: [SWAT] [config] Actually Revert 'Configure logging to use MWLoggerMonologSpi' (duration: 00m 10s)
16:44 logmsgbot: marktraceur Synchronized wmf-config/: [SWAT] [config] Revert Configure logging to use MWLoggerMonologSpi (duration: 00m 05s)
16:44 ottomata: starting trusty upgrade of analytics1011
16:43 logmsgbot: marktraceur Synchronized wmf-config/: [SWAT] [config] Configure logging to use MWLoggerMonologSpi (duration: 00m 07s)
16:25 logmsgbot: marktraceur Synchronized php-1.25wmf12/extensions/WikimediaEvents/WikimediaEvents.php: [SWAT] [wmf12] Bump sendBeacon schema revision so new URL will be generated (duration: 00m 16s)
16:23 logmsgbot: marktraceur Synchronized php-1.25wmf11/extensions/WikimediaEvents/WikimediaEvents.php: [SWAT] [wmf11] Bump sendBeacon schema revision so new URL will be generated (duration: 00m 14s)
15:59 ottomata: starting trusty upgrade of analytics1033
15:04 hashar: @damons we love you!
15:01 hashar: saved Jenkins configuration via the web interface to reset the interface language from Chinese to English
13:31 logmsgbot: springle Synchronized wmf-config/db-eqiad.php: pool db1004 in s7, warm up (duration: 00m 06s)
06:02 ori: restarted apache on palladium and strontium
04:08 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Dec 11 04:08:50 UTC 2014 (duration 8m 49s)
04:08 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Dec 11 04:08:33 UTC 2014 (duration 30m 22s)
03:34 logmsgbot: ori Synchronized php-1.25wmf11/extensions/Math: Ic438b307a3b46: Fix for fatal caused by static call to MathRenderer::getError (duration: 00m 06s)
14:00 paravoid: running dpkg --remove-architecture i386 (trusty); rm /etc/dpkg/dpkg.cfg.d/multiarch (precise) across the whole fleet with the exception of gallium/lanthanum
11:46 _joe_: cleaning and vacuuming the HHVM cache on a few hosts
09:00 _joe_: cleaning and vacuuming the hhvm repo on mw1030
08:38 logmsgbot: ori Synchronized php-1.25wmf11/extensions/CommonsMetadata: (no message) (duration: 00m 07s)
08:10 logmsgbot: ori Synchronized php-1.25wmf11/extensions/CommonsMetadata/TemplateParser.php: Update CommonsMetadata for cherry-picks (duration: 00m 05s)
00:47 logmsgbot: ori Synchronized php-1.25wmf10/extensions/Math: Ic438b307a3b46: Fix for fatal caused by static call to MathRenderer::getError (duration: 00m 07s)
00:42 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: Set $wgWMEStatsdBaseUri for WikimediaEvents / statsv (duration: 00m 07s)
22:01 logmsgbot: ori Synchronized php-1.25wmf11/includes/api/ApiStashEdit.php: I5c296325: Various edit stash fixes (duration: 00m 06s)
20:53 legoktm: ran update revision set rev_page="8555535" where rev_page="6628330"; on frwiki
20:48 legoktm: ran update revision set rev_page="8555529" where rev_page="1469156"; on frwiki (for T76979)
20:46 YuviPanda: started /usr/local/bin/dumpwikidatajson.sh on snapshot1003 per hoo, after killing php processes from earlier start as well as from the earlier botched kill
20:44 ottomata: renaming all webrequest varnishkafka instances
20:37 YuviPanda: started /usr/local/bin/dumpwikidatajson.sh on snapshot1003 per hoo, to re-start dump script aborted earlier
20:33 logmsgbot: reedy Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 06s)
17:27 logmsgbot: csteipp Synchronized php-1.25wmf11/extensions/Listings/Listings.body.php: (no message) (duration: 00m 07s)
17:07 godog: powercycle ms-be1012, no console
16:42 _joe_: depooling mw1201-04
16:41 _joe_: repooling mw1194-1200
16:34 logmsgbot: anomie Synchronized wmf-config/CommonSettings-labs.php: Deploy some Labs-only changes so they're not showing as undeployed (duration: 00m 05s)
16:31 logmsgbot: anomie Synchronized php-1.25wmf11/extensions/Wikidata/: SWAT: Fix issue with json dump and sites caching in Wikidata gerrit:178533 (duration: 00m 15s)
16:29 logmsgbot: anomie Synchronized php-1.25wmf10/includes/filerepo/file/File.php: SWAT: Fix for broken thumbnails when the file width is in $wgThumbnailBucket gerrit:178529 (duration: 01m 04s)
16:18 logmsgbot: anomie Synchronized php-1.25wmf11/includes/filerepo/file/File.php: SWAT: Fix for broken thumbnails when the file width is in $wgThumbnailBucket gerrit:178531 (duration: 00m 08s)
15:36 qchris: restarted EventLogging's m2 writer on vanadium. Events did not get written into the database.
08:13 _joe_: depooling mw1115-1119 from the api pool, reimaging
08:06 _joe_: restarting diamond on all appservers
06:01 logmsgbot: ori Synchronized php-1.25wmf11/extensions/Math/MathInputCheckTexvc.php: Fix for fatal caused by static call to MathRenderer::getError (duration: 00m 06s)
06:01 logmsgbot: ori Synchronized php-1.25wmf10/extensions/Math/MathInputCheckTexvc.php: Fix for fatal caused by static call to MathRenderer::getError (duration: 00m 06s)
04:27 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Dec 9 04:27:31 UTC 2014 (duration 27m 30s)
19:19 csteipp: deployed patches for T77028 and T76686
19:13 logmsgbot: ori Finished scap: I5a7e258d2: Optimize how user options are delivered to the client (duration: 26m 45s)
18:46 logmsgbot: ori Started scap: I5a7e258d2: Optimize how user options are delivered to the client
18:04 YuviPanda: removed restbase/ from graphite for T77172 on tungsten
17:54 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: Unset \$wgTidyInternal (duration: 00m 07s)
17:52 manybubbles: rebuilding eswiki's cirrus index to pick up fix for slow prefix searches
17:02 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: enable tidy extension on mw1081 (duration: 00m 06s)
16:56 ottomata: doing controlled restart of kafka broker analytics1021, and then initiating replica election to bring it back into leadership
16:21 hashar: Jenkins: disconnected / reconnected gallium slave from the web interface. It was locked not being able to run the mediawiki/vagrant postmerge doc job
16:08 logmsgbot: demon Synchronized php-1.25wmf11/extensions/VisualEditor: (no message) (duration: 00m 06s)
16:08 logmsgbot: demon Synchronized php-1.25wmf10/extensions/VisualEditor: (no message) (duration: 00m 09s)
15:03 hashar: Broke zuul-cloner by mistake
14:36 godog: reboot graphite1001 for kernel upgrade
13:11 hashar: Restarting zuul and zuul-merger on gallium
13:10 hashar: Zuul: rebasing our fork to bring some upstream changes
10:41 godog: upload diamond 3.5-2 to trusty-wikimedia
09:18 andrewbogott: the failure looked like this: "Unexpected error in mod_passenger: Could not connect to the ApplicationPool server: Broken pipe (32)"
09:18 andrewbogott: graceful'd apache on virt1000 -- resolving a mysterious puppetmaster outage
04:11 springle: puppet fact_values hit auto_inc limit. altered table to restart from 1 to get puppet running (seems safe, but needs checking, maybe also truncate)
03:42 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Dec 8 03:42:28 UTC 2014 (duration 42m 27s)
02:15 logmsgbot: LocalisationUpdate completed (1.25wmf11) at 2014-12-08 02:15:53+00:00
02:15 logmsgbot: l10nupdate Synchronized php-1.25wmf11/cache/l10n: (no message) (duration: 00m 01s)
02:10 logmsgbot: LocalisationUpdate completed (1.25wmf10) at 2014-12-08 02:10:20+00:00
02:10 logmsgbot: l10nupdate Synchronized php-1.25wmf10/cache/l10n: (no message) (duration: 00m 03s)
December 7
03:41 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Dec 7 03:41:52 UTC 2014 (duration 41m 51s)
02:16 logmsgbot: LocalisationUpdate completed (1.25wmf11) at 2014-12-07 02:16:30+00:00
02:16 logmsgbot: l10nupdate Synchronized php-1.25wmf11/cache/l10n: (no message) (duration: 00m 02s)
02:11 logmsgbot: LocalisationUpdate completed (1.25wmf10) at 2014-12-07 02:11:07+00:00
02:11 logmsgbot: l10nupdate Synchronized php-1.25wmf10/cache/l10n: (no message) (duration: 00m 01s)
December 6
22:51 ori: restarted apache on palladium
19:48 Krinkle: Made trivial edit to Jenkins language config to purge the French invasion (default language: en-us -> en-US)
19:39 Krinkle: Jenkins has been conquered by the French again
03:42 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Dec 6 03:42:37 UTC 2014 (duration 42m 36s)
02:18 logmsgbot: LocalisationUpdate completed (1.25wmf11) at 2014-12-06 02:18:52+00:00
02:18 logmsgbot: l10nupdate Synchronized php-1.25wmf11/cache/l10n: (no message) (duration: 00m 01s)
02:13 logmsgbot: LocalisationUpdate completed (1.25wmf10) at 2014-12-06 02:13:12+00:00
02:13 logmsgbot: l10nupdate Synchronized php-1.25wmf10/cache/l10n: (no message) (duration: 00m 01s)
15:49 YuviPanda: repooled mw1220, re-imaging to hhvm complete
15:28 cmjohnson: rebooting analytics1033 to verify bios settings
14:42 YuviPanda: depooling mw1220 for HHVM re-imaging
14:36 YuviPanda: depooling mw1220 to re-image as HHVM
11:54 godog: remove legacy symlink /home/wikipedia/syslog from lithium
11:20 _joe_: repooling mw1061,mw1066-mw1070
09:54 _joe_: repooling mw1060,mw1062-65; depooling mw1067-mw1070 for reimaging
08:00 _joe_: depooling mw1060-mw1067 for reimaging
07:37 _joe_: repooling mw1048-1052
05:11 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: Set $wgTidyInternal to false unconditionally to ease deployment of tidy extension (duration: 00m 06s)
04:23 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Dec 2 04:23:23 UTC 2014 (duration 23m 22s)
02:32 logmsgbot: LocalisationUpdate completed (1.25wmf10) at 2014-12-02 02:32:21+00:00
02:32 logmsgbot: l10nupdate Synchronized php-1.25wmf10/cache/l10n: (no message) (duration: 00m 03s)
02:19 logmsgbot: LocalisationUpdate completed (1.25wmf9) at 2014-12-02 02:19:18+00:00
02:19 logmsgbot: l10nupdate Synchronized php-1.25wmf9/cache/l10n: (no message) (duration: 00m 01s)
01:42 awight: updated dash to b3f4be0bbd6c16be64030607fd9c59cb84111429
01:37 K4-713: updated payments to c0c4bfcdb4fa625fa52
16:37 logmsgbot: anomie Synchronized php-1.25wmf9/extensions/SyntaxHighlight_GeSHi/geshi/geshi.php: SWAT: Fix highly recursive number highlighting regex in GeSHi (duration: 00m 07s)
16:35 logmsgbot: anomie Synchronized php-1.25wmf10/extensions/SyntaxHighlight_GeSHi/geshi/geshi.php: SWAT: Fix highly recursive number highlighting regex in GeSHi (duration: 00m 10s)
16:04 logmsgbot: demon Synchronized wmf-config/abusefilter.php: (no message) (duration: 00m 05s)
16:04 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 05s)
15:49 bd808: restarted logstash on logstash1001; log2udp events were not being processed
15:26 _joe_: depooling mw1047-mw1052
15:24 _joe_: repooling mw1041-mw1046
14:26 _joe_: depooling mw1041-1046
14:16 _joe_: repooling mw1036-mw1040
13:46 _joe_: removing the same files from ocg1002,3 as well
13:44 _joe_: removing cache files from ocg1001, when they're older than 3 days
09:55 _joe_: reimaging mw1033-mw1040 to HHVM, depooling from the main pool now
09:31 _joe_: upgrading hhvm to the latest version across the cluster
04:46 logmsgbot: tstarling Synchronized php-1.25wmf10/includes/parser/MWTidy.php: change previously pulled but scap was apparently not run (duration: 00m 05s)
04:44 logmsgbot: tstarling Synchronized php-1.25wmf9/includes/parser/MWTidy.php: change previously pulled but scap was apparently not run (duration: 00m 06s)
03:34 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Dec 1 03:34:57 UTC 2014 (duration 34m 56s)
02:17 logmsgbot: LocalisationUpdate completed (1.25wmf10) at 2014-12-01 02:17:56+00:00
02:17 logmsgbot: l10nupdate Synchronized php-1.25wmf10/cache/l10n: (no message) (duration: 00m 03s)
02:10 logmsgbot: LocalisationUpdate completed (1.25wmf9) at 2014-12-01 02:10:33+00:00
02:10 logmsgbot: l10nupdate Synchronized php-1.25wmf9/cache/l10n: (no message) (duration: 00m 01s)
00:51 logmsgbot: tstarling Synchronized wmf-config/StartProfiler.php: (no message) (duration: 00m 05s)
November 30
22:51 qchris: Updated EventLogging to 19c23698bc03694017d764af33307d6f035fc224 and restarted it
20:51 qchris: restarted eventlogging mysql-m2-master consumer. It seems it could no longer write to the database.
19:17 Krinkle: Disabling and relauching Gearman connection from Jenkins.
09:08 logmsgbot: oblivian Synchronized wmf-config/jobqueue-eqiad.php: changing the aggregator address as well (duration: 00m 05s)
07:27 _joe_: restarted the jobrunner service on all jobrunners
07:15 logmsgbot: oblivian Synchronized wmf-config/jobqueue-eqiad.php: (no message) (duration: 00m 05s)
05:50 ori: 3:50 UTC: switch asw-c-eqiad lost connectivity with cabinet C4. Impact: phabricator down; gap in web request logs and some perf monitoring. Job queue and Recent Changes stream OK b/c redundant servers are up.
03:41 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Nov 30 03:41:03 UTC 2014 (duration 41m 2s)
02:18 logmsgbot: LocalisationUpdate completed (1.25wmf10) at 2014-11-30 02:18:22+00:00
02:18 logmsgbot: l10nupdate Synchronized php-1.25wmf10/cache/l10n: (no message) (duration: 00m 02s)
02:11 logmsgbot: LocalisationUpdate completed (1.25wmf9) at 2014-11-30 02:11:01+00:00
02:11 logmsgbot: l10nupdate Synchronized php-1.25wmf9/cache/l10n: (no message) (duration: 00m 01s)
November 29
04:17 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Nov 29 04:17:41 UTC 2014 (duration 17m 40s)
02:25 logmsgbot: LocalisationUpdate completed (1.25wmf10) at 2014-11-29 02:25:53+00:00
02:25 logmsgbot: l10nupdate Synchronized php-1.25wmf10/cache/l10n: (no message) (duration: 00m 01s)
02:13 logmsgbot: LocalisationUpdate completed (1.25wmf9) at 2014-11-29 02:13:35+00:00
02:13 logmsgbot: l10nupdate Synchronized php-1.25wmf9/cache/l10n: (no message) (duration: 00m 01s)
November 28
13:19 YuviPanda: restarted apache on palladium, things are recovering
11:20 qchris: Updated gerrit plugin its-phabricator-from-bugzilla to 97c5f02d3ca6259488a763515251c5cc57a11a51
11:20 qchris: Updated gerrit plugin its-phabricator to 9edf90a182e43bfeea7ebbcb20d4a52b6213600d
03:39 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Nov 28 03:39:52 UTC 2014 (duration 39m 51s)
02:22 logmsgbot: LocalisationUpdate completed (1.25wmf10) at 2014-11-28 02:22:28+00:00
02:22 logmsgbot: l10nupdate Synchronized php-1.25wmf10/cache/l10n: (no message) (duration: 00m 04s)
02:10 logmsgbot: LocalisationUpdate completed (1.25wmf9) at 2014-11-28 02:10:06+00:00
02:10 logmsgbot: l10nupdate Synchronized php-1.25wmf9/cache/l10n: (no message) (duration: 00m 01s)
November 27
17:55 _joe_: restarted hhvm on mw1224, the alarm may have been lost in the puppet failure shower earlier
17:24 godog: removed /var/lib/carbon/whisper/archived/jenkins from tungsten
17:09 godog: upload txstatsd 0.7.0~bzr30-0ubuntu0+14 to precise-wikimedia on carbon
16:49 godog: upload missing txstatsd 1.0.0-1 _source package_ to carbon
16:48 godog: upload missing txstatsd 1.0.0-1 to carbon
15:48 logmsgbot: hoo Synchronized php-1.25wmf10/extensions/Wikidata/: Fixing a data model bug + enable Statements on Properties for testwikidata (duration: 00m 12s)
15:34 logmsgbot: hoo Synchronized wmf-config/Wikibase.php: Set "displayStatementsOnProperties" for wikidata/testwikidata (duration: 00m 06s)
14:48 akosiaris: upgrading librsvg throughout the fleet
04:35 springle: restarted squid3 on carbon, but glitches seem to be upstream
04:23 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Nov 27 04:23:45 UTC 2014 (duration 23m 44s)
22:17 hashar: Bah there can only be one mediawiki-core-doxygen-publish job running, with all the merges that happened on mediawiki/core due to the release, there are currently six of them in the queue. They will all be processed eventually
22:14 hashar: mediawiki/core postmerge changes are stuck because mediawiki-core-doxygen-publish refuses to start. Attempted to retrigger them by promoting a change: gallium$ zuul promote --pipeline postmerge --changes 175960,1
22:13 ejegg: updated crm from d0a51250d2bdbf3c818ec0486af284691c7a61ff to 96f66e6b6c947c4e4c32c4a4a32dc940dc3b1d60
22:08 hashar: investigating Zuul/Jenkins. Jenkins potentially has a deadlock
22:02 cscott: updated Parsoid to version 67e2596c
21:52 hashar: Restarting Gearman client. I am in a meeting, will cleanup later.
21:33 bd808: restarted logstash on logstash1001; log2udp events not being received
16:54 logmsgbot: gwicke Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 05s)
16:34 Krinkle: Changed Jenkins default language from "en_US" to "en" ("Ignore browser settings" was already enabled). Not sure why, but it's back to English now.
16:16 logmsgbot: marktraceur Synchronized wmf-config/: [SWAT] [config] 174793 Enable VisualEditor as a Beta Feature on most remaining wikis (duration: 00m 06s)
16:13 Krinkle: Jenkins is displaying everything in French (both logged-in/logged-out users alike)
16:11 logmsgbot: marktraceur Synchronized php-1.25wmf9/extensions/Flow/: [SWAT] [wmf9] 175941 "Provide user to local LQT api calls" for officewiki. (duration: 00m 08s)
12:25 godog: stopped ocg on ocg1*
12:18 godog: restarting ocg on ocg1001
12:00 godog: removing pdf files older than 14d from ocg100*
11:57 godog: removing pdf files older than 14d from ocg1001
06:48 logmsgbot: tstarling Synchronized w/oauth-headers.php: (no message) (duration: 00m 05s)
06:43 logmsgbot: tstarling Synchronized w/oauth-headers.php: (no message) (duration: 00m 06s)
06:40 logmsgbot: tstarling Synchronized live-1.5/oauth-headers.php: (no message) (duration: 00m 05s)
06:34 logmsgbot: tstarling Synchronized php-1.25wmf9/extensions/OAuth/lib/OAuth.php: (no message) (duration: 00m 05s)
06:09 logmsgbot: tstarling Synchronized php-1.25wmf9/extensions/OAuth/lib/OAuth.php: (no message) (duration: 00m 06s)
06:07 logmsgbot: tstarling Synchronized php-1.25wmf9/extensions/OAuth/lib/OAuth.php: (no message) (duration: 00m 06s)
05:15 logmsgbot: tstarling Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 06s)
04:24 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Wed Nov 26 04:24:14 UTC 2014 (duration 24m 13s)
02:30 logmsgbot: LocalisationUpdate completed (1.25wmf9) at 2014-11-26 02:30:20+00:00
02:30 logmsgbot: l10nupdate Synchronized php-1.25wmf9/cache/l10n: (no message) (duration: 00m 01s)
02:18 logmsgbot: LocalisationUpdate completed (1.25wmf8) at 2014-11-26 02:18:29+00:00
02:18 logmsgbot: l10nupdate Synchronized php-1.25wmf8/cache/l10n: (no message) (duration: 00m 03s)
01:23 bd808: restarted logstash on logstash1001; no events from log2udp relay being recorded
21:55 logmsgbot: ejegg Synchronized wmf-config/CommonSettings.php: Turn CN client-side banner choice back on everywhere (duration: 00m 05s)
21:39 logmsgbot: ejegg Synchronized php-1.25wmf8/extensions/CentralNotice/: One more CentralNotice fix to get out ahead of the winter rush - wmf8 (duration: 00m 07s)
21:22 logmsgbot: ejegg Synchronized wmf-config/CommonSettings.php: Turn CN client-side banner choice back on for selected wmf9 wikis (duration: 00m 05s)
21:15 logmsgbot: ejegg Synchronized php-1.25wmf9/extensions/CentralNotice/: One more CentralNotice fix to get out ahead of the winter rush (duration: 00m 05s)
20:53 Nemo_bis: 100 % packet loss between esams and r1fra1.core.init7.net
16:14 _joe_: pooling mw1237-1258 in the appserver pool
15:03 godog: upload bcache-tools 1.0.7-1 to carbon
12:15 _joe_: pooling mw1221-mw1226 in the API pool
06:37 YuviPanda: restarted apache on strontium, was seeing transient puppetmaster fails
06:08 mutante: in respose to jenkins login issue reported by krinkle: /var/lib/jenkins/xml.config on gallium had "virt1000" value for LDAP, earlier Andrew made a switch from there to ldap-eqiad. fixed config, restarted jenkins
06:06 mutante: restarted gitblit
04:31 jgage: restarted jenkins
04:22 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Nov 25 04:22:50 UTC 2014 (duration 22m 49s)
03:53 Krinkle: Jenkins is unable to create new user sessions. Suspect LDAP is having issues.
03:16 springle: m2 db1020 rebuilt, but blocked from dbproxy1002 until replag=0
03:12 logmsgbot: awight Synchronized wmf-config: Disabling CentralNotice client banner choice due to T75812 (duration: 00m 05s)
02:31 logmsgbot: LocalisationUpdate completed (1.25wmf9) at 2014-11-25 02:31:48+00:00
02:31 logmsgbot: l10nupdate Synchronized php-1.25wmf9/cache/l10n: (no message) (duration: 00m 01s)
02:18 logmsgbot: LocalisationUpdate completed (1.25wmf8) at 2014-11-25 02:18:52+00:00
02:18 logmsgbot: l10nupdate Synchronized php-1.25wmf8/cache/l10n: (no message) (duration: 00m 01s)
02:15 mutante: old-bugzilla now behind varnish too, cert issue should be gone
02:07 bblack: all LVS back to normal runtime state w/ new SSL config
01:56 bblack: switching off pybal on primary LVS in esams for HTTPS check
01:54 bblack: switching off pybal on primary LVS in eqiad for HTTPS check
01:51 bblack: esams+eqiad backup LVS converted to new ssl config (lvs100[45] + lvs300[34])
16:28 ottomata: starting upgrade to trusty of analytics1017
16:27 logmsgbot: anomie Synchronized php-1.25wmf8/includes/filebackend: SWAT: Log more details about backend-fail-internal errors gerrit:174128 (duration: 00m 09s)
16:18 bblack: rubidium+eeden gdnsd upgraded to 2.1.0 (baham was already there)
16:06 manybubbles: replaying 20,000 searches at approximately the same speed that they were issued caused only marginal bounce in load (cluster load average was 13% and two machines went about 20%). We're ready from a performance standpoint. yay
16:02 logmsgbot: anomie Synchronized wmf-config/InitialiseSettings.php: Touch a random PHP file, supposedly required (duration: 00m 09s)
16:02 manybubbles: replaying some searches against cirrus to make *super* *duper* sure it won't fall over tomorrow when we enable enwiki
16:01 logmsgbot: anomie Synchronized visualeditor-default.dblist: SWAT: Enable VisualEditor by default on Catalan Wikiquote (cawikiquote) gerrit:174036 (duration: 00m 09s)
16:01 logmsgbot: anomie Synchronized visualeditor.dblist: SWAT: Enable VisualEditor by default on Catalan Wikiquote (cawikiquote) gerrit:174036 (duration: 00m 09s)
15:43 ottomata: starting trusty upgrade of analytics1016
20:19 csteipp: patched bugs 71111 and 71394 in wmf7 and wmf8
20:14 cmjohnson: powering down logstash1003 for a few mins to add disks
19:52 ottomata: starting upgrade to trusty on analytics1023
19:15 awight: campaigns reenabled
18:55 awight: disabling CentralNotice campaigns
17:49 ottomata: preparing for trusty upgrade of analytics1003
16:57 bd808: dropped replica count to 0 for logstash indices from 2014-10-30 and 2014-10-31.
16:49 bd808: restarted elasticsearch on logstash1002
16:46 bd808: dropped replica count to 0 for logstash indices from 2014-10-14 through 2014-10-29. See https://phabricator.wikimedia.org/P73 for the commands.
16:45 ottomata: preparing to upgrade analytics1026 to trusty
16:21 bd808: disk utilization is 94% on logstash1002, 92% on logstash1001 and 91% on logstash1003. Too much data in indices even with replica count bumped down to 1 for the small disks we have today.
16:16 bd808: logstash elasticsearch cluster is pretty messed up. logstash1002 has lost shards for all indices except for today, and it's master for that one.
16:14 logmsgbot: manybubbles Synchronized wmf-config/CirrusSearch-production.php: SWAT reenable regex search now that it will not crash elasticsearch (duration: 00m 04s)
16:11 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: SWAT force summary when running checkuser query on all wikis (duration: 00m 04s)
16:01 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: SWAT revert JPG thumbnail chaining on all wikis except commons (duration: 00m 05s)
20:25 manybubbles: restarting elastic1021 to pick up new plugins
20:21 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedias to 1.25wmf7
20:13 logmsgbot: reedy Finished scap: testwiki to 1.25wmf8 and build l10n cache (duration: 53m 57s)
19:37 hoo: Made myself oauthadmin on mediawikiwiki
19:19 logmsgbot: reedy Started scap: testwiki to 1.25wmf8 and build l10n cache
19:05 mutante: installing package upgrades on bast1001 (incl. PHP version)
19:04 mutante: installing package upgrades on iron
18:38 YuviPanda: turned off yurik's zerosms cronjob on stat1002 (already discussed with him, he was ok with it being stopped until he could find time to fix it)
17:58 _joe_: gracefulling apache on problematic API hosts
17:05 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 04s)
16:51 logmsgbot: anomie Synchronized php-1.25wmf7/extensions/SecurePoll/: SWAT: SecurePoll fix for jump-text and title on create/edit gerrit:172718 (for real this time) (duration: 00m 09s)
16:48 logmsgbot: anomie Finished scap: SWAT: SecurePoll fix for jump-text and title on create/edit gerrit:172718 (duration: 22m 13s)
16:26 logmsgbot: anomie Started scap: SWAT: SecurePoll fix for jump-text and title on create/edit gerrit:172718
23:22 paravoid: reprepro: include src:libmaxminddb, src:geoipupdate for precise/trusty
22:14 cscott: updated Parsoid to version b61475196
21:49 cscott: updated OCG to version d9855961b18f550f62c0b20da70f95847a215805
21:36 mutante: powercycling frozen stat1002
18:42 manybubbles: restarting remaining elasticsearch boxes in sequence to pick up new plugins
18:30 godog: reboot db1017 to pick up an updated kernel
18:29 logmsgbot: ori Synchronized php-1.25wmf6/includes/ChangeTags.php: Iec9befeba: Hide HHVM tag on Special:{Contributions,RecentChanges,...} (duration: 00m 05s)
18:29 logmsgbot: ori Synchronized php-1.25wmf7/includes/ChangeTags.php: Iec9befeba: Hide HHVM tag on Special:{Contributions,RecentChanges,...} (duration: 00m 06s)
17:52 manybubbles: restart elastic1002 to pick up new plugins
17:16 manybubbles: elastic1001 finished restarting. letting is soak up shards for a few minutes to make sure restart was ok. then we'll plow through the others
17:02 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 05s)
16:52 manybubbles: restarting elastic1001 to pick up new plugins.
16:50 manybubbles: deployed new versions of elasticsearch plugins to fix regex querying
19:12 _joe_: restarted apache on mw1192, this time an hard restart
17:11 hoo: mw1192 stuck with almost no idle workers as most workers are in the "Gracefully finishing" state. Attempted to gracefully restart it, but that (to no surprise) didn't help.
November 8
20:17 Krinkle: Jenkins/Zuul was still stuck. Disconnected and relaunched slave agents on lanthanum and gallium. This fixed it (slaves in labs were fine).
20:01 Krinkle: Jenkins/Zuul appear stuck. Disconnect/Re-enable Gearman from Jenkins.
15:30 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: Gerrit I46b151ff: Reverting addition of Draft namespace to enwiki (duration: 00m 04s)
11:09 YuviPanda: ran makelost+found on /srv/postgres on labsdb1004 to kill cronspam
11:07 YuviPanda: ran makelost+found on /srv/postgres on labsdb1007 to kill cronspam
01:21 logmsgbot: ori Synchronized php-1.25wmf5/extensions/MobileFrontend: Ic82ba72b98: Update MobileFrontend for cherry-picks (duration: 00m 04s)
01:21 logmsgbot: ori Synchronized php-1.25wmf6/extensions/MobileFrontend: Ic26f56c0d: Update MobileFrontend for cherry-picks (duration: 00m 05s)
01:12 ^d|voted: elastic1022: banned from allocation since its unreachable. just in case it starts flapping.
01:11 mutante: elatic1022 - eth0: <NO-CARRIER
01:07 ori: upgrading HHVM app servers to 3.3.0+dfsg1-1+wm2
01:02 mutante: powercycling elastic1022
00:45 ^d|voted: elasticsearch: rebuilding all cirrus indexes for all wikis from a screen on terbium, going to take awhile. should be boring, but if causing problems kill it first and then find me.
00:24 logmsgbot: demon Synchronized php-1.25wmf6/includes/parser/Parser.php: (no message) (duration: 00m 04s)
00:23 logmsgbot: demon Synchronized php-1.25wmf6/includes/parser/CoreTagHooks.php: (no message) (duration: 00m 04s)
00:23 logmsgbot: demon Synchronized php-1.25wmf5/includes/parser/Parser.php: (no message) (duration: 00m 04s)
00:23 logmsgbot: demon Synchronized php-1.25wmf5/includes/parser/CoreTagHooks.php: (no message) (duration: 00m 05s)
00:22 logmsgbot: demon Synchronized php-1.25wmf6/extensions/CirrusSearch/maintenance/updateOneSearchIndexConfig.php: (no message) (duration: 00m 04s)
00:22 logmsgbot: demon Synchronized php-1.25wmf5/extensions/CirrusSearch/maintenance/updateOneSearchIndexConfig.php: (no message) (duration: 00m 04s)
00:04 logmsgbot: demon Synchronized wmf-config/CirrusSearch-common.php: (no message) (duration: 00m 04s)
00:03 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 07s)
November 4
19:22 cmjohnson: rebooting wtp1023
17:21 ejegg: updated crm from b8a1fa98b5d9252d708090c99b61fd22ebe8d2be to e9e81a828d50e8bddf98eae699c925e09b25927b
23:01 cscott: reconfigured OCG logstash path to use bunyan. The _type field is currently missing (used to be "OfflineContentGenerator"). Will fix tomorrow.
22:32 cscott: updated OCG to version 5834af97ae80382f3368dc61b9d119cef0fe129b
15:25 logmsgbot: demon Synchronized wmf-config/CirrusSearch-production.php: (no message) (duration: 00m 04s)
14:59 logmsgbot: demon Synchronized php-1.25wmf6/extensions/CirrusSearch: (no message) (duration: 00m 04s)
14:59 logmsgbot: demon Synchronized php-1.25wmf5/extensions/CirrusSearch: (no message) (duration: 00m 04s)
14:56 _joe_: rotated logs on ocg1001, restarted both ocg and rsyslog
14:23 akosiaris: update DNS/NTP settings, add codfw on nas1001-a,b
13:27 manybubbles: reenable was uneventful. good news.
13:25 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: reenable cirrus everywhere where it has been after the outage has passed (duration: 00m 03s)
12:41 manybubbles: reenabled cirrus as betafeature - no spike in error logs
12:32 logmsgbot: manybubbles Synchronized wmf-config/: Disable Cirrus accelerated regexes as we *think* they might be causing outages (duration: 00m 04s)
12:31 manybubbles: restart of elasticsearch nodes got them back to responsive. Cluster isn't fully healed yet but we're better then we were. Still not sure how we got this way
12:26 manybubbles: restarting all elasticsearch boxes in quick sequence. when I try restarting a frozen box another one freezes up (probably an evil request being retried on it after its buddy went down).
11:46 manybubbles: heap dumps aren't happening. Even with the config to dump them on oom errors. Restarting Elasticsearch nodes to get us back to stable and going to have to investigate from another direction.
11:30 manybubbles: restarting gmond on elasticsearch nodes so I can get a clearer picture of them
11:24 logmsgbot: oblivian Synchronized wmf-config/InitialiseSettings.php: ES is down, long live lsearchd (duration: 00m 09s)
10:52 godog: restarting elasticsearch on elastic1031, heap exhausted at 30G
23:56 awight: update civicrm from 1f0dc2ce0ab84765c085cc0ee369a7a047c0d005 to f47ed6f7e55946388db1dde787ca458c27a57c5a
23:08 logmsgbot: demon Synchronized php-1.25wmf6/extensions/CirrusSearch: (no message) (duration: 00m 04s)
23:08 logmsgbot: demon Synchronized php-1.25wmf5/extensions/CirrusSearch: (no message) (duration: 00m 05s)
19:02 cmjohnson: powering off elastic1009-1002 to replace ssds
18:35 mutante: restarting nginx on toollabs webproxy
18:35 manybubbles: unbanning elastic1006 now that it is proplery configured
17:54 _joe_: syncronized downsizing to 5%
17:54 logmsgbot: oblivian Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 06s)
17:42 _joe_: rolling restarted hhvm appservers
17:38 hashar: Zuul seems to be happy. Reverted my lame patch to send Cache-Control headers since we have a cache breaker it is not needed.
17:21 bd808: 10.64.16.29 is db1040 in the s4 pool
17:18 bd808: "Connection error: Unknown error (10.64.16.29)" 1052 in last 5m; 2877 in last 15m
17:16 hashar: Upgrading Zuul to have the status page emit a Cache-Control header bug 72766 wmf-deploy-20141030-1..wmf-deploy-20141030-2
17:11 bd808: Upgraded kibana to v3.1.1 again. Better testing now that logstash is working.
17:01 bd808: Logs on logstash1003 showed "Failed to flush outgoing items <Errno::EBADF: Bad file descriptor - Bad file descriptor>" on shutdown. Maybe something not quite right about elasticsearch_http plugin?
16:22 manybubbles: moving shards off of elastic1003 and elastic1006 so they can be restarted. elastic1003 need hyperthreading and elastic1006 needs noatime.
16:17 cmjohnson: powering off elastic1015-16 to replace ssds
16:04 hashar: restarted Zuul with upgraded version ( wmf-deploy-20140924-1..wmf-deploy-20141030-1 )
14:04 godog: reload swift frontend in eqiad after password rotation
14:04 logmsgbot: demon Synchronized wmf-config/PrivateSettings.php: (no message) (duration: 00m 04s)
13:48 logmsgbot: manybubbles Synchronized php-1.25wmf5/extensions/CirrusSearch/: (no message) (duration: 00m 05s)
13:47 logmsgbot: manybubbles Synchronized php-1.25wmf4/extensions/CirrusSearch/: (no message) (duration: 00m 11s)
01:01 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: Turn Cirrus back on basically everywhere. If Elasticsearch freaks out again just revert I73ae276e to get back to lsearchd again (duration: 00m 04s)
00:43 logmsgbot: ori Synchronized php-1.25wmf4/extensions/WikimediaEvents/WikimediaEventsHooks.php: I4adffaa26: Actually unset the HHVM cookie (duration: 00m 03s)
00:43 logmsgbot: ori Synchronized php-1.25wmf5/extensions/WikimediaEvents/WikimediaEventsHooks.php: I4adffaa26: Actually unset the HHVM cookie (duration: 00m 03s)
22:12 logmsgbot: aaron Synchronized wmf-config/CommonSettings.php: Stop GWT wgJobBackoffThrottling values from getting lost (duration: 00m 03s)
20:35 subbu: deployed parsoid sha 617e9e61
20:26 cscott: updated OCG to version 60b15d9985f881aadaa5fdf7c945298c3d7ebeac
20:10 logmsgbot: maxsem Synchronized php-1.25wmf4/extensions/GeoData: GeoData back to normal (duration: 00m 03s)
19:39 manybubbles: after restarting elasticsearch we expected to get memory errors again. no such luck so far....
18:57 manybubbles: completed restarting elasticsearch cluster. now it'll make a useful file on out of memory errors. raised the recovery throttling so it'll recover fast enough to cause oom errors
18:47 logmsgbot: maxsem Synchronized php-1.25wmf4/extensions/GeoData: live hack to disable geosearch (duration: 00m 04s)
18:37 manybubbles: note that this is a restart without waiting for the cluster to go green after each restart. I expect lots of whining from icinga. This will cause us to lose some updates but should otherwise be safe.
18:34 manybubbles: restarting elasticsearch servers to pick up new gc logging and to reset them into a "working" state so they can have their gc problem again and we can log it properly this time.
18:15 logmsgbot: aaron Synchronized wmf-config/CommonSettings.php: Remove obsolete flags (all of them) from $wgAntiLockFlags (duration: 00m 07s)
17:53 cmjohnson: replacing disk /dev/sdl slot 11 ms-be1013
17:37 _joe_: uploaded a version of jemalloc for trusty with --enable-prof
16:31 ^d: elasticsearch: temporarily raised node_concurrent_recoveries from 3 to 5.
15:32 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: Enable Cirrus as secondary everywhere, brings back GeoData (duration: 00m 04s)
15:08 manybubbles: Its unclear how much of the master going haywire is something that'll be fixed in elasticsearch 1.4. They've done a lot of work there on the cluster state communication.
15:07 manybubbles: for posterity 10/18 of the elasticsearch servers had got the point where they couldn't free any heap. Its currently not clear to me why they did that. This caused the cluster to basically collapse. The master node kept beind unable to communicate with anyone because everyone was pausing for multiple minutes between replies. The cluster handshaking couldn't cope with that and promptly got itself into a state where nodes were both part of the cluster and not part of the cluster at the same time. Thats bad.
15:03 manybubbles: restarting gmond on all elasticsearch systems because stats aren't updating properly in ganglia and usually that helps
15:02 manybubbles: restarted a bunch of the elasticsearch nodes that had their heap full. wasn't able to get a heap dump on any of them because they all froze while trying to get the heap dump.
14:32 ^d: elasticsearch: disabling replica allocation, less things moving about if we restart cluster
13:47 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: fall back to lsearchd for a bit (duration: 00m 05s)
13:41 logmsgbot: manybubbles Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 05s)
13:29 manybubbles: restarted elasticsearch on elastic1017 - memory was totally full there
13:21 manybubbles: elastic1008 is logging gc issues. restarting it because that might help it
05:04 springle: forced logrotate ocg1001
03:36 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Oct 27 03:36:39 UTC 2014 (duration 36m 38s)
02:27 logmsgbot: LocalisationUpdate completed (1.25wmf5) at 2014-10-27 02:27:45+00:00
02:17 logmsgbot: LocalisationUpdate completed (1.25wmf4) at 2014-10-27 02:17:08+00:00
October 26
23:46 Krinkle: Force restarted Zuul
15:14 Krinkle: Jenkins/Zuul is stuck as of 20 hours ago
15:06 _joe_: restarted hhvm on mw1114, memory nearly exhausted
03:36 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sun Oct 26 03:36:20 UTC 2014 (duration 36m 19s)
02:25 logmsgbot: LocalisationUpdate completed (1.25wmf5) at 2014-10-26 02:25:47+00:00
02:15 logmsgbot: LocalisationUpdate completed (1.25wmf4) at 2014-10-26 02:15:12+00:00
October 25
22:49 paravoid: upgrading JunOS on cr1-ulsfo
22:32 paravoid: scheduling downtime for all ulsfo -lb- & cr1/2-ulsfo
21:30 logmsgbot: ori Synchronized php-1.25wmf5/extensions/CentralNotice/CentralNotice.hooks.php: Iee2072ac7: Make sure we declare globals before using them (duration: 00m 06s)
21:30 logmsgbot: ori Synchronized php-1.25wmf4/extensions/CentralNotice/CentralNotice.hooks.php: Iee2072ac7: Make sure we declare globals before using them (duration: 00m 06s)
20:41 bd808: updated logstash-* labs instances to salt minion 2014.1.11 (thanks for the ping apergos)
14:43 apergos: all active labs instances now running salt minion 2014.1.11 except for: logstash-* (have their own master), fabapi (pingable, can't ssh on), upload-wizard (running oneiric, not setting up a repo for that!). instances shutoff or w/ nova error were left untouched
03:46 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Sat Oct 25 03:46:48 UTC 2014 (duration 46m 47s)
02:29 logmsgbot: LocalisationUpdate completed (1.25wmf5) at 2014-10-25 02:29:29+00:00
02:18 logmsgbot: LocalisationUpdate completed (1.25wmf4) at 2014-10-25 02:18:14+00:00
00:27 awight: updated DjangoBannerStats from cf5a875d49f4c4cf229d7f864a73d4c2f588ebf9 to a3038f133d64c737d3987bd1c37a987fd3003dd6
October 24
22:40 akosiaris: puppet disabled on uranium, do not enable
20:52 andrewbogott: revived virt1006 on a probationary basis. It's running compute but is disabled so new instances won't be scheduled there. I've moved a few test instances there to see how it behaves.
20:36 logmsgbot: andrew Synchronized wmf-config/wikitech.php: (no message) (duration: 00m 04s)
07:25 _joe_: restarted hhvm on mw1114, depooled the server
05:39 Tim: on mw1189 testing some URLs at a high rate, attempting to induce measurable memory leak
05:06 Tim: reverted unexplained uncomitted modification of palladium:/srv/pybal-config/pybal/eqiad/api which repooled mw1189
03:41 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Oct 23 03:41:49 UTC 2014 (duration 41m 48s)
03:06 logmsgbot: cscott Synchronized wmf-config/filebackend.php: fix using a file from commons with file name length between 140 and 159 (duration: 00m 20s)
16:43 apergos: all precise hosts salt updated to 2014.1.11, this includes tin (deployment) and virt1000 (salt master for labs). Not updated: virt1006 (inaccessible)
14:26 paravoid: cr1-ulsfo: deactivating ospf/ospf3 on GTT ulsfo-eqiad link
13:26 paravoid: cr2-ulsfo: "request chassis mic {off,on}line fpc-slot 1 mic-slot 1" to reboot broken card
12:28 apergos: upgraded salt master (plus minion) on palladium to 2014.1.11, all neww precise installs will get this version now, other minion upgrades to follow shortly
11:51 godog: temporarily stopped ircecho/icinga-wm on neon, shower of alarms
11:42 godog: killed stray/old copy of diamond that was filling up conntrack on virt1000
09:53 akosiaris: restarted ocg on ocg1001, ocg1002, ocg1003
07:07 _joe_: rolling restart of ocg services
04:29 springle: removed old /var/log/ocg* on ocg1001 and ocg1003 and forced logrotate, / space critical
03:42 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Oct 20 03:42:24 UTC 2014 (duration 42m 23s)
03:13 logmsgbot: LocalisationUpdate completed (1.25wmf4) at 2014-10-17 03:13:33+00:00
02:39 logmsgbot: LocalisationUpdate completed (1.25wmf3) at 2014-10-17 02:39:26+00:00
01:16 mutante: powering up server formerly known as cp1001
00:47 logmsgbot: hoo Synchronized php-1.25wmf4/extensions/Wikidata/: Fix ORMTable usage, IE 11 freeze bug and adopt to further core changes (duration: 00m 14s)
October 16
23:07 mutante: RT - removed global permission for privileged users to create tickets - should not affect anyone because users are either not privileged or get this from other groups - need it to be flexible about readonly queues in RT - let me know if any issues
19:05 ottomata: deployed webstatscollector 0.4 on oxygen (filter) and gadolinium (collector)
18:46 ori: adjusting pybal weight for mw1114 back up to 20 to confirm that leak is in luasandbox
17:57 ori_: installed lua5.1 on mw1114 so i can switch scribunto to luastandalone and thus potentially isolate the leak to luasandbox
15:38 andrewbogott: running sync-common on virt1000
15:36 logmsgbot: marktraceur Synchronized php-1.25wmf3/extensions/OpenStackManager/: [SWAT] [wmf3] Make list=novainstances available to anons (duration: 00m 06s)
15:36 logmsgbot: marktraceur Synchronized php-1.25wmf2/extensions/OpenStackManager/: [SWAT] [wmf2] Make list=novainstances available to anons (duration: 00m 05s)
15:22 paravoid: AMS-IX renumbering: remove old IP from interface, migration over; > 75% of total peers migrated, accounting for much more bandwidth/routes
15:13 logmsgbot: marktraceur Synchronized php-1.25wmf3/extensions/VisualEditor/modules/ve-mw/ui: [SWAT] [wmf3] Update OOjs UI to v0.1.0-pre (d74a46ca6a) and VisualEditor-MediaWiki to Ie06056b (duration: 00m 05s)
15:12 logmsgbot: marktraceur Synchronized php-1.25wmf3/resources/lib/oojs-ui: [SWAT] [wmf3] Update OOjs UI to v0.1.0-pre (d74a46ca6a) and VisualEditor-MediaWiki to Ie06056b (duration: 00m 06s)
14:32 paravoid: AMS-IX renumbering: move all remaining ASNs to the new space
14:20 Coren: Not reimaging mw1035 after all; hhvm is in our base, killing our ramz.
14:16 paravoid: AMS-IX renumbering: peering with (renumbered) top-10 ASNs + ASNs with large number of prefixes
14:12 Coren: reimaging mw1035 for great justice!!! (HHVM)
14:11 _joe_: powercycling mw1205, down since this morning, console blank
13:29 logmsgbot: manybubbles Synchronized wmf-config/CirrusSearch-common.php: Add configuration so cirrus can build an index to speed up regex searches (duration: 00m 04s)
11:22 godog: rolling restart of container-server on ms-be1*
11:13 godog: rolling restart of container-server on ms-be2*
07:12 legoktm: running migratePass0 across all wikis
06:49 _joe_: load testing done
06:22 _joe_: doing some load testing on HHVM (api)
04:38 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Fri Oct 10 04:38:38 UTC 2014 (duration 38m 37s)
04:22 ^d: elasticsearch upgrade from 1.3.2 -> 1.3.4 complete for all 18 nodes. Sporadic icinga warnings about health should go away now
04:19 springle: upgrade db1046 mariadb 10
03:44 springle: enable purging of old eventlogging data from specific tables on m2-master, as per analytics@ discussion
03:12 logmsgbot: LocalisationUpdate completed (1.25wmf3) at 2014-10-10 03:12:51+00:00
02:39 logmsgbot: LocalisationUpdate completed (1.25wmf2) at 2014-10-10 02:39:23+00:00
00:53 legoktm: running initSiteStats.php on wikidatawiki
00:47 legoktm: ran updateArticleCount.php --wiki=ckbwiki (bug 71884)
October 9
23:57 logmsgbot: maxsem Synchronized php-1.25wmf2/resources/: (no message) (duration: 00m 03s)
23:57 logmsgbot: maxsem Synchronized php-1.25wmf3/resources/: (no message) (duration: 00m 04s)
23:55 logmsgbot: maxsem Synchronized php-1.25wmf2/extensions/OpenStackManager/: (no message) (duration: 00m 04s)
23:52 logmsgbot: maxsem Synchronized php-1.25wmf3/extensions/MobileApp: (no message) (duration: 00m 03s)
23:50 logmsgbot: maxsem Synchronized php-1.25wmf2/extensions/MobileApp: (no message) (duration: 00m 04s)
23:45 logmsgbot: maxsem Synchronized php-1.25wmf2/extensions/Flow/: (no message) (duration: 00m 09s)
23:43 logmsgbot: maxsem Synchronized php-1.25wmf3/extensions/Wikidata/: (no message) (duration: 00m 10s)
23:40 logmsgbot: maxsem Synchronized php-1.25wmf2/extensions/Wikidata/: (no message) (duration: 00m 10s)
22:24 K4-713: updated payments wiki to 17f822a64742bd13e
20:33 subbu: deployed parsoid version 644071d2
20:03 Jeff_Green: rebooting samarium
19:51 logmsgbot: reedy Synchronized wmf-config/: (no message) (duration: 00m 15s)
19:42 manybubbles: upgrading elastic1014
19:34 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.25wmf3
19:31 logmsgbot: reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikipedias to 1.25wmf2
19:21 logmsgbot: reedy Finished scap: testwiki to 1.25wmf3 and build l10n cache (take 2) (duration: 30m 00s)
18:51 logmsgbot: reedy Started scap: testwiki to 1.25wmf3 and build l10n cache (take 2)
18:51 bd808: cherry-picked I3ae9edab2505c37945fe66863721913a6d33223c to scap
18:42 logmsgbot: reedy scap failed: TypeError bufsize must be an integer (duration: 08m 33s)
18:34 logmsgbot: reedy Started scap: testwiki to 1.25wmf3 and build l10n cache
17:56 Coren: begin reimaging of mw1027
17:55 Coren: done reimaging of mw1028. Now hhvm_appserver
16:58 _joe_: gracefully restarted again api apaches to recover 500s
16:43 godog: re-enable puppet on ms-fe/ms-be in eqiad
16:39 godog: re-enable puppet on ms-fe/ms-be in codfw
16:23 logmsgbot: oblivian gracefulled all apaches
15:34 hashar: restarted Zuul
15:31 Coren: begin reimaging of mw1028
15:31 Coren: done reimaging of mw1029. Now hhvm_appserver
15:26 andrewbogott: upgraded wikitech-static to 1.25wmf2
14:02 akosiaris: updated pybal on palladium for citoid
13:54 Coren: begin reimaging of mw1029
12:01 logmsgbot: reedy Purged l10n cache for 1.24wmf22
11:59 springle: converted some librenms tables to innodb on db1001 m1-master. should be a no-op
11:57 springle: xtrabackup db1016 to db2010
11:39 manybubbles: starting upgrade of elastic1009
11:11 _joe_: reenabled puppet on mw*
11:11 godog: disabled puppet in ms-fe/ms-be in eqiad/codfw to merge container-sync changes
10:35 _joe_: disabling puppet on most mw* hosts while testing apache changes
08:17 _joe_: repooling mw102[3-5],mw1053 in the hhvm pool
07:15 _joe_: reimaging mw102[3-5] to hhvm
07:02 _joe_: reinstalling mw1053
03:33 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Thu Oct 9 03:33:47 UTC 2014 (duration 33m 46s)
02:30 logmsgbot: LocalisationUpdate completed (1.25wmf2) at 2014-10-09 02:30:03+00:00
02:17 logmsgbot: LocalisationUpdate completed (1.25wmf1) at 2014-10-09 02:17:53+00:00
October 8
23:31 mutante: importing xml dump to cawikimedia
23:28 andrewbogott: running sync-common on virt1000
23:27 logmsgbot: demon Synchronized php-1.25wmf2/extensions/OpenStackManager: (no message) (duration: 00m 06s)
23:23 logmsgbot: demon Synchronized php-1.25wmf2/extensions/Flow: (no message) (duration: 00m 05s)
23:14 logmsgbot: demon Synchronized php-1.25wmf2/extensions/CommonsMetadata: (no message) (duration: 00m 06s)
22:40 mutante: virt0 - deleted salt key, revoked puppet cert, removed from site.pp
22:39 Reedy: Removed openjdk-6-* from logstash100[1-3]
22:07 subbu: updated OCG to version def24eca
22:07 mutante: tin - deleted empty pmtpa dsh group files
22:02 mutante: tin - there are dozens of dsh groups that have been removed from repo long time ago but never got purged, but it isn't easy to tell what might still be used, so deleting all and letting puppet recreate might be risky?
20:48 legoktm: currently running /home/legoktm/fixBug71749.php on terbium
19:49 Reedy: logstash upgraded to 1.4.2-1 on logstash100[1-3]
19:46 Reedy: Created flow tables on officewiki
17:04 _joe_: load testing done
16:44 _joe_: doing some load testing on the hhvm servers
16:09 Reedy: elasticsearch upgraded on logstash1001 to 1.3.4
16:07 Reedy: elasticsearch upgraded on logstash100[23] to 1.3.4
14:20 hasharBusy: disabled puppet on gallium to make sure a zuul config change stick in. 165481
14:19 manybubbles: fixed missing elasticsearch extension jar file and brought elastic1001 back up. git fat betrayed us.
14:14 hasharBusy: hard restarting zuul
14:03 manybubbles: upgrading elastic1001 uncovered a bug in our highlighter that I have yet to diagnose. I removed that server from the rotation so we'll continue to use the old version.
10:53 godog: restart commons swiftrepl from ms-fe1003 and non-commons from ms-fe1004 to avoid maxing out copper's nic
10:09 godog: start swiftrepl of commons originals eqiad -> codfw
09:56 godog: start swiftrepl of non-commons originals eqiad -> codfw
06:02 logmsgbot: ori Synchronized php-1.25wmf1/includes/objectcache/HashBagOStuff.php: I0b0b5f01: HashBagOStuff: use the value itself as the CAS token (duration: 00m 06s)
06:02 logmsgbot: ori Synchronized php-1.25wmf2/includes/objectcache/HashBagOStuff.php: I0b0b5f01: HashBagOStuff: use the value itself as the CAS token (duration: 00m 07s)
03:28 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Tue Oct 7 03:28:09 UTC 2014 (duration 28m 8s)
02:26 logmsgbot: LocalisationUpdate completed (1.25wmf2) at 2014-10-07 02:26:04+00:00
02:14 logmsgbot: LocalisationUpdate completed (1.25wmf1) at 2014-10-07 02:14:21+00:00
01:16 logmsgbot: ori Synchronized php-1.25wmf1/extensions/Wikidata: Ie92da71 / I44f1dce: Update Wikidata, fixes for serialization issues (duration: 00m 09s)
01:15 logmsgbot: ori Synchronized php-1.25wmf2/extensions/Wikidata: Ie92da71 / I44f1dce: Update Wikidata, fixes for serialization issues (duration: 00m 10s)
00:34 Tim: core dumps were enabled on mw1088, unexpectedly started gathering natural segfault traffic
16:19 logmsgbot: ori Synchronized wmf-config/CommonSettings.php: Set $wgPercentHHVM to 1 (duration: 00m 27s)
15:54 cscott: updated OCG to version aee3712b352f51f96569de0bcccf3facf654e688
15:45 GroggyPanda: deleted graphite data for deployment-rsync02 by hand on labmon1001, since instance has been dead. Need to move to shinken + dynamic host.cfg
15:22 logmsgbot: manybubbles Synchronized wmf-config/: SWAT Add tracking categories for files with attribution problems (duration: 00m 06s)
15:19 cscott: ran 'sudo -u ocg -g ocg nodejs-ocg scripts/run-garbage-collect.js -c /home/cscott/config.js' from /home/cscott/ocg/mw-ocg-service in order to clear caches (working around https://gerrit.wikimedia.org/r/164644 ) on ocg100x.eqiad.wmnet
14:51 cmjohnson1: disconnecting Tampa servers
13:46 godog: starting test swiftrepl run on wikibooks eqiad -> codfw
11:49 _joe_: done restarting ocg servers
11:34 _joe_: rolling restart and cleaning of ocg nodes, trying to unlock pdf generation
11:11 mark: Shutdown tarin
11:11 mark: Shutdown sanger
09:27 _joe_: cleaned ocg another time
09:07 mark: Stopped dovecot on sanger
08:06 _joe_: cleaned ocg1001, again
05:57 Nemo_bis: "book creator seems stuck": PDF servers at 97 % CPU, little traffic, enough disk free for about 1 day more
03:26 logmsgbot: LocalisationUpdate ResourceLoader cache refresh completed at Mon Oct 6 03:26:31 UTC 2014 (duration 26m 30s)
20:30 cscott: (the above was on ocg100x.eqiad.wmnet)
20:30 cscott: ran 'sudo -u ocg -g ocg nodejs-ocg scripts/run-garbage-collect.js -c /home/cscott/config.js' from /home/cscott/ocg/mw-ocg-service in order to clear caches (working around https://gerrit.wikimedia.org/r/164644 )
20:23 andrewbogott: restarted zuul
20:22 logmsgbot: ori Synchronized wmf-config/mc.php: Ie1ed821a7: Set bloom cache config (duration: 00m 03s)
19:34 ori: when running puppet merge: fatal: Unable to create '/var/lib/git/operations/puppet/.git/refs/remotes/origin/production.lock': File exists.
19:28 hashar: Restarting Zuul sorry :-/
19:26 hashar: Zuul in some kind of death loop
19:05 mutante: purging old 'searchqa' scripts and logs from iron (gerrit 164429 removes from puppet)
18:46 mutante: restored dbtree from manual backup (should have been synced by scripts)
18:25 logmsgbot: aaron Synchronized php-1.25wmf2/extensions/CentralAuth: (no message) (duration: 00m 05s)
21:43 logmsgbot: demon Synchronized wmf-config/InitialiseSettings.php: Disabling prerendering of images from this mornings swat (duration: 00m 04s)
21:33 bblack: added LVS BGP config setup to cr[12]-codfw
20:46 logmsgbot: ori Synchronized php-1.24wmf22/extensions/NavigationTiming: Update NavigationTiming for cherry-picks (duration: 00m 03s)
20:44 logmsgbot: ori Synchronized php-1.25wmf1/extensions/NavigationTiming: Update NavigationTiming for cherry-picks (duration: 00m 04s)
20:11 bblack: stopping -> starting uwsgi/apache -type stuff on tungsten
19:55 aude: populated sites table for fawikivoyage
19:27 bd808: hosts that failed Trebuchet update of scap: virt0.wikimedia.org, fenari.wikimedia.org, mw1110.eqiad.wmnet, mw1053.eqiad.wmnet. mw1053.eqiad.wmnet only failed checkout
19:23 bd808: Trebuchet reports for scap sync "231/234 minions completed fetch; 230/234 minions completed checkout" Some stale entries need to be removed from Trebuchet redis cache
19:21 bd808: Updated scap to eff0d01 (Fix format specifier for error message)
06:00 ori: mw1053 still flooding error logs with "Unrecognized job type 'EchoNotificationDeleteJob'." Disabling Puppet and jobrunner for now, planning to investigate during SF daytime hours.
20:32 hoo: Ran sync-common on mw1053 to stop "Unrecognized job type 'EchoNotificationDeleteJob'." exceptions
20:27 cscott: updated OCG to version 48c495e3656f528abe636ce0cd7562270505534f
19:40 logmsgbot: yurik Synchronized php-1.25wmf1/extensions/ZeroBanner/: (no message) (duration: 01m 05s)
19:38 logmsgbot: yurik Synchronized php-1.24wmf22/extensions/ZeroBanner/: (no message) (duration: 01m 03s)
19:17 mutante: fenari - removed from dsh - rejoice deployers, should be faster now
19:03 logmsgbot: yurik Synchronized php-1.24wmf22/extensions/ZeroBanner/: (no message) (duration: 01m 08s)
18:59 logmsgbot: yurik Synchronized php-1.24wmf22/extensions/ZeroBanner/: (no message) (duration: 01m 09s)
18:56 logmsgbot: yurik Synchronized php-1.24wmf22/extensions/ZeroBanner/: (no message) (duration: 01m 04s)
18:34 Krinkle: (..jenkins) The command runs fine when done in that workspace from shell. Looks like a bug with Jenkins Java abstraction layer.
18:28 ori: disabling puppet on mw1019 to test impact of ProxyBadHeader apache directive
18:25 Krinkle: Jenkins jobs for repos with git submodules broken ("git-submodule: git reset: not found")
17:41 andrewbogott: graceful'd apache on logstash1001 logstash1002 logstash1003
16:56 cmjohnson1: swapping disk db1020
16:32 godog: reverted change to syslog.eqiad.wmnet, back to nfs-home.pmtpa.wmnet
15:40 JetLaggedPanda: purged graphite logs for deployment-mediawiki04 by hand on labmon1001 to prevent it from causing issues on icinga, since the instance has been deleted previously
15:32 ottomata: starting upgrade of stat1002 from precise to trusty