15:10 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 33410 - Allow IP edits on wikimania2012.wikimedia.org'
08:29 apergos: live hack on db31: added nagios ALL=NOPASSWD: /usr/bin/arcconf getconfig 1 to sudoers to shut up security warnings, this is a bad approach though
08:27 apergos: live hack on hume: added ". /etc/apache2/envvars.default" to /etc/apache2/envars on hume, w/o it apache wouldn't start, which killed static.wp.o
07:20 apergos: from 1 hour and 20 mins ago: Ryan_Lane: virt4 is back up. br103 had a bad MAC of 00...
05:28 Ryan_Lane: rebooting virt4 again
05:27 Ryan_Lane: rebooting all migrated instances
05:11 Ryan_Lane: virt4 has some networking issue
05:11 Ryan_Lane: migrating all instances off of virt4 onto virt3 and virt2
04:29 Ryan_Lane: restarting virt4 because upstart is an unreliable POS
02:53 Ryan_Lane: enabled instance live migration on virt2-4
02:01 logmsgbot: LocalisationUpdate completed (1.18) at Thu Dec 29 02:05:01 UTC 2011
December 28
23:00 binasher: db31 wasn't puppetized, fixed
22:49 logmsgbot: catrope synchronized php-1.18/extensions/MoodBar/ 'updating to trunk state'
22:49 logmsgbot: catrope synchronized php-1.18/extensions/MarkAsHelpful/ 'updating to trunk state'
22:26 logmsgbot: catrope synchronized php-1.18/extensions/MoodBar 'Forgot to sync this earlier'
21:25 logmsgbot: catrope synchronized wmf-config/InitialiseSettings.php 'Enable MarkAsHelpful on enwiki'
16:02 Jeff_Green: hume:/usr/local/bin/copy_impression_logs_from_storage3.pl deprecated, t'was an emergency hack deployed when storage3 dropped a disk
14:51 mutante: fixed mobile traffic logger checks - what they report is for real now. 2 procs on first server, 4 procs on the other three
07:33 apergos: live-hacked /usr/local/bin/copy_impression_logs_from_storage3.pl on hume, it was rsyncing everything into /a/static/uncompressed/2... do we need this job? there is also /usr/local/bin/offhost_backups on storage3 that seems to copy to the same dir, can whoever set this up take a look?
07:03 apergos: maybe the cron job on these image scalers should run more often... cleaned up /tmp on srv221
02:01 logmsgbot: LocalisationUpdate completed (1.18) at Wed Dec 28 02:04:37 UTC 2011
16:49 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Part of bug 33137 Enable GoogleNewsSitemap on ptwikinews'
16:46 logmsgbot: reedy synchronized closed.dblist 'Locking readerfeedback.l.w.o per bg 33229'
16:42 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 33042 - Set wgBabelMainCategory to false for nlwiki'
07:51 apergos: removed manually log files olders than 14 days from /a/static/uncompressed on hume; looks like there's a problem with the script /usr/local/bin/offhost_backups on storage3 which was modified recently, guess the rsync delete option isn't being picked up
02:02 logmsgbot: LocalisationUpdate completed (1.18) at Sat Dec 24 02:06:03 UTC 2011
16:27 mutante: first interesting syslog line when it started: formey kernel: [39665413.570024] INFO: task kswapd0:36 blocked for more than 120 seconds.
16:10 mutante: gerrit and svn back up
16:07 mutante: gerrit stopped working and formey would still ping but no ssh connect and no mgmt output, powercycling formey
02:07 binasher: db9 maintenance completed
02:02 logmsgbot: LocalisationUpdate completed (1.18) at Fri Dec 23 02:05:19 UTC 2011
01:56 binasher: starting db9 maintenance - services will be unavailable for approx 15 minutes
00:08 logmsgbot: hashar synchronized php-1.18/extensions/WikimediaIncubator/IncubatorTest.php 'deploy r107120 - (bug 32772) fix up MoodBar and WikiLove on WIkimedia Incubator'
December 22
23:43 maplebed: put owa1-3 in as container servers, took ms1-3 out for pmtpa test swift cluster
17:05 mutante: spence: according to [1] we should even double that if we have "high latency values (> 10 or 15 seconds)" and we have like > 1000
17:04 mutante: spence: check out "nagios -s /etc/nagios/nagios.cfg" for performance data - it suggests "Value for 'max_concurrent_checks' option should be >= 1231"
16:55 Jeff_Green: manually rotated spence:/var/log/nagios/nagios.log because nagios log rotation appears broken and the file is ~2.6G
16:32 logmsgbot: catrope synchronized wmf-config/InitialiseSettings.php 'Underscores -> spaces in wmgArticleFeedbackBlacklistCategories'
16:14 apergos: restarting scp on ds2, seems that it renegotiates after 64GB and that was failing, fixed
15:25 apergos: thumbs cleaner on ms5 complete. (don't worry, a new job will start up tomorrow)
15:16 mutante: installing security upgrades on tarin (includes perl and php)
14:10 apergos: another couple binlogs gone on ds9
13:41 mutante: added testswarm package to repo and installed it on gallium
13:15 logmsgbot: catrope synchronized wmf-config/InitialiseSettings.php 'Use the correct interwiki prefix'
13:13 logmsgbot: catrope synchronized wmf-config/InitialiseSettings.php 'Configure $wgImportSources on en_labswikimedia'
12:59 logmsgbot: catrope synchronized wmf-config/InitialiseSettings.php 'Whitelist Category:Article_Feedback_5_Additional_Articles for AFTv5 and blacklist it for AFTv4 on enwiki and en_labswikimedia'
10:46 apergos: ds2 scp to ds1 stalled in the same place, looking into it
09:31 apergos: should have logged this earlier, prolly about 2 hours ago removed 3 more bin logs from db9, we were getting crowded again.
09:29 apergos: first attempt at scp from ds2 to ds1 failed after 64gb, nothing useful in log, process on ds1 was hung at "restarting system call"... shot it and running again, from screen as root on ds2.
07:53 apergos: after some playing around on ms5 (whichis responsible for the little io utilization spikes, but I'm done now), thumb cleaner is back at work for what should be its last day
06:07 Tim: removed mobile1, srv124, srv159, srv183, srv186 from /etc/dsh/group/apaches: not in mediawiki-installation
05:59 Tim: removed srv162, srv174 from /etc/dsh/group/job-runners: not in puppet jobrunners class
02:22 Ryan_Lane: fixed labsconsole. reverted aws-sdk to 1.4
02:04 logmsgbot: LocalisationUpdate completed (1.18) at Tue Dec 20 02:07:54 UTC 2011
00:52 Ryan_Lane: seems I broke labsconsole :(
00:52 Ryan_Lane: ran svn up on openstackmanager on virt1
00:05 logmsgbot: asher synchronized wmf-config/db.php 'putting db50 into rotation for s6'
December 19
23:59 binasher: started replicating db50 from db47
22:56 binasher: resolved apt issues on db13,17
22:50 binasher: running a hot xtrabackup of db47 to db50
22:17 RobH: db50/db51 online for asher to deploy into s6
22:12 Jeff_Green: deployed two new cron jobs on hume via /etc/cron.d/mw-fundraising-stats, temporary, will puppetize once we see that they script is working properly
21:41 logmsgbot: awjrichards synchronizing Wikimedia installation... : Deploying ContributionReporting fixes to use summary tables (r106696), disabling ContributionReporting everywhere except test and foundationwikis
21:25 logmsgbot: asher synchronized wmf-config/db.php 'pulling db43 which died'
21:12 apergos: note that dataset1 appears to be keeping time ok
21:12 apergos: started an scp (to make ds1 work a tiny bit harder) of some files from ds2. running on ds2 in screen session as root.
18:28 logmsgbot: nikerabbit synchronizing Wikimedia installation... : I18ndeploy r106667 and new extensions on mediawiki.org
16:42 RobH: dataset1 new data partition ready and setup to automount
15:49 RobH: dataset1 reinstalled and has had puppet run. Now to see if it can keep time
15:46 RoanKattouw: maerlant is fried, load avg is 500+, linearly increasing since Friday. Rejects SSH login attempts
15:45 notpeter: restarting indexer on searchidx2
14:16 apergos: thumb cleaner to bed for the night... for the last time?
13:15 mutante: truncated spence.cfg in ./puppet_checks.d/ - it had multiple dupe service definitions for all checks on spence
13:11 mutante: commented check_job_queue stuff from non-puppetized files on spence (hosts.cfg, conf.php) to get rid of "duplicate definition" now that it's been pupptized
12:35 mutante: deleted snapshot4 files from /var/lib/puppet/yaml/node and ./yaml/facts on sockpuppet and stafford, they got recreated and fixed puppet run on sn4
10:08 apergos: a few more binlogs on db9 gone. eeking out another 12 hours or so
06:57 apergos: thumb cleaner awake for the day. poor thing, slaving away but soon it will be able to retire
01:57 logmsgbot: LocalisationUpdate failed (1.18) at Mon Dec 19 02:00:11 UTC 2011
December 18
16:41 notpeter: removing about 4G of binlogs from db9. everything more than 24 hours old.
15:12 apergos: thumb cleaner sleeping it off for the night
07:34 apergos: thumb cleaner to work for the day
01:57 logmsgbot: LocalisationUpdate failed (1.18) at Sun Dec 18 02:00:04 UTC 2011
December 17
22:49 RobH: Anytime db9 hits 98 or 99% someone needs to remove binlogs to bring it back down to 94 or 95%
22:48 RobH: removed older binlogs on db9 again to kick it back to a bit more free space to last the weekend.
17:53 logmsgbot: catrope synchronized wmf-config/CommonSettings.php 'Remove SVN dir setting, this is now passed in on the command line'
16:43 RoanKattouw: Found out why LocalisationUpdate was failing. Would have been fixed already if puppet had been running on fenari, but it's throwing errors. See r1617 and my comment on r1558
14:32 apergos: thumb cleaner to bed for the night... about 2 days left I think
07:25 apergos: thumb cleaner started up for the day
01:57 logmsgbot: LocalisationUpdate failed (1.18) at Sat Dec 17 02:00:18 UTC 2011
December 16
22:30 RobH: reclaimed space on db9, restarted mysql, services seem to be recovering
22:24 maplebed: restarting mysql on db9; brief downtime for a number of apps (bugzilla, blog, etc.) expected.
22:03 RobH: db9 space reclaimed back to 94% full, related services should start recovering
21:57 RobH: db9 disk full, related services are messing up, fixing
21:56 RobH: kicking apache for bz related issues on kaulen
19:07 logmsgbot: catrope synchronized wmf-config/InitialiseSettings.php 'Set AFTv4 lottery odds to 100% on en_labswikimedia'
18:48 LeslieCarr: removed the ssl* yaml logs on stafford to fix the puppet not running error
16:13 apergos: thumb cleaner to bed for the night. definitely need an alarm clock for this... good thing it's only got about 4 days of backlog left
15:41 RobH: es1002 being actively worked on for hdd controller testing
15:39 RobH: lvs1003 disk dead per RT 1549, will torubleshoot on site later today or Monday
15:32 RobH: lvs1003 unresponsive to serial console, rebooting
15:18 RobH: reinstalling dataset1
14:45 mutante: puppet was broken on all servers including "nrpe" due to package conflict with nagios-plugins-basic i added to base, revert+fix
13:29 RoanKattouw: Dropping and recreating AFTv5 tables on en_labswikimedia and enwiki
13:26 logmsgbot: catrope synchronized php-1.18/extensions/ArticleFeedbackv5/ 'Updating to trunk state'
13:25 mutante: tweaked Nagios earlier today: external command_check_interval & event_broker_options (see comments in gerrit Id3b4a458)
13:01 mark: Found lvs5 and lvs6 with offload-gro enabled, even though it's set disabled in /etc/network/interfaces... corrected
09:21 apergos: restarted lighthttpd on ds2, it had stopped (and why didn't nagios tell us? )
08:38 mutante: spence - had killed additional notifications.cgi and history.cgi procs, waited 5 minutes, load went down a lot, restarting nagios
08:22 mutante: spence - almost unusable, Nagios notifications.cgi and history.cgi use a lot of memory, stopping Nagios, watching swap
08:15 mutante: spence slow again, side-note: tried to use "sar" to investigate but "Please check if data collecting is enabled in /etc/default/sysstat" (want to?)
23:47 logmsgbot: catrope synchronized wmf-config/InitialiseSettings.php 'Enable ClickTracking on en_labswikimedia so AFTv5 will work'
23:29 logmsgbot: catrope synchronized wmf-config/InitialiseSettings.php 'Enable AFTv5 on en_labswikimedia'
23:22 logmsgbot: catrope synchronizing Wikimedia installation... : Deploying MoodBar changes for feedback dashboard. Along for the ride: VisualEditor changes by Neil and a Badtitle bug fix by Tim
23:14 RoanKattouw: Created AFTv5 tables on en_labswikimedia and enwiki
23:01 logmsgbot: catrope synchronized wmf-config/CommonSettings.php 'Add trigger code for AFTv5'
22:59 binasher: restarted nagios with enable_environment_macros = 0
22:57 logmsgbot: catrope synchronized wmf-config/InitialiseSettings.php 'Disable AFTv5 for now though'
15:50 logmsgbot: catrope synchronized wmf-config/CommonSettings.php 'Disable DB logging for ClickTracking'
15:45 logmsgbot: catrope synchronized wmf-config/CommonSettings.php 'Send all ClickTracking logs to emery over UDP, including non-enwiki wikis with ClickTracking enabled'
15:44 logmsgbot: LocalisationUpdate failed
15:42 RoanKattouw: Aborted LU run based on /home . Will restart once puppet has moved the LU checkout to /var/lib
15:14 logmsgbot: catrope synchronized wmf-config/InitialiseSettings.php 'Send all ClickTracking logs to emery over UDP, including non-enwiki wikis with ClickTracking enabled'
14:53 RoanKattouw: Running l10nupdate one more time, this time with the 'old' script, pulling from the SVN checkout on /home (this'll be slow)
14:36 mutante: fixed screen permissions on hume earlier today per RT:2133 - Ubuntu bug #390405
14:33 mark: /var/lib/puppet on stafford filled up with puppet reports; cleaned up and installed a cron job to prevent that from happening again
14:31 logmsgbot: LocalisationUpdate completed (1.18) at Wed Dec 14 14:34:27 UTC 2011
13:43 RoanKattouw: Changing ownership of /home/wikipedia/common/php-1.18/cache/l10n to l10nupdate:wikidev
13:40 logmsgbot: LocalisationUpdate failed (1.18) at Wed Dec 14 13:44:01 UTC 2011
13:39 RoanKattouw: Doing a trial run of my new LU script on fenari, from my home directory
13:34 logmsgbot: catrope synchronized php-1.18/extensions/LocalisationUpdate/ 'Update LU to trunk state'
13:31 mark: Moved ms4 to internal
12:08 RoanKattouw: Changing ownership of fenari:/home/wikipedia/l10n to l10nupdate:wikidev , recursively (was: 500:wikidev). TODO: move this off of /home and puppetize it
09:34 p858snake|l: [19:23] <logmsgbot> \!log Reedy fixExtLinksProtocolRelative.php is still on frwiki at 182\,200
06:59 apergos: thumbs cleaner back at work for the day
04:24 logmsgbot: preilly synchronized php-1.18/extensions/MobileFrontend/ 'weekly update to mobile frontend'
01:32 Reedy: fixExtLinksProtocolRelative is on frwiki, 50,500 run in screen on fenari as reedy. Please shoot overnight if needed
01:31 Reedy: fixExtLinksProtocolRelative is onrofrwiki, 50,500 run in screen on fenari
01:21 logmsgbot: preilly synchronized php-1.18/extensions/MobileFrontend/ 'weekly update to mobile frontend'
01:14 logmsgbot: preilly synchronized php-1.18/extensions/MobileFrontend/ 'weekly update to mobile frontend'
20:34 RoanKattouw: Actually running Puppet on the bits Apaches now
20:28 RoanKattouw: Manually running Puppet on all bits Apaches using dsh
19:58 RoanKattouw: Removed spence from the mediawiki-installation node list
19:45 logmsgbot: neilk synchronizing Wikimedia installation... : deploying VisualEditor, also deploying some new JS libs and associated tests and messages to core which were taken from UploadWizard
22:20 binasher: upgraded xtrabackup in wikimedia-lucid to 1.6.3, changed puppet rule to ensure latest
21:39 binasher: stafford:/var/lib/puppet was full, extended fs by 5GB
21:16 RoanKattouw: Commented out srv159, srv183, srv186 from /etc/dsh/group/mediawiki-installation : these servers have been decommissioned and don't seem to be picking up SSH key changes from puppet
08:14 apergos: chown/grp mwdeploy of /usr/local/apache/common/php/cache on srv150-srv190 to shut up the l10n rsync job
07:56 apergos: on hume installed parsekit via pecl, needed for cron jbo mb-dump.sh (mood bar) and for a tor-related cron job
07:53 apergos: on hume installed libcrypt-ssleay-perl and (maybe superfluous) libnet-ssleay-perl to fix honeypot cronjob which was relying on an old version in someone's home directory
07:02 apergos: also restarted morebots, it was missing from the channel for some reason
07:01 apergos: thumb cleaner started up on ms5 for the day
December 11
16:57 apergos: thumb cleaner to bed for the night, let it run a bit long but seems like traffic was ok on a Sunday night
16:00 logmsgbot_: reedy synchronized wmf-config/InitialiseSettings.php 'Set EnableDnsBlacklis for thwiki'
15:58 logmsgbot_: reedy synchronized wmf-config/InitialiseSettings.php 'swap wgEnableSorbs for wgEnableDnsBlacklist'
15:49 mark: Restarted formey with lower max clients setting
14:40 apergos: thumb cleaner to bed for the night
13:55 mark: domas fixed max_open_files issue on db9 (too many connections from mchenry), and fixed replication to db10 which I broke by deleting all binlogs
13:37 mark: Removed some binlogs on db9
10:06 logmsgbot_: hashar synchronized wmf-config/CommonSettings.php 'bug 32513 : Collection license link: now use upstream URL instead of a license with ton of wikitext'
09:51 logmsgbot_: hashar synchronized wmf-config/CommonSettings.php 'GFDL is long gone, we use CC-BY-SA 3.0 nowaday. See bug 32513'
23:32 LeslieCarr: on cr1-sdtpa and cr2-pmtpa, routing labs a 208.80.153.192/27 instead of the smaller 208.80.153.192/28 block
23:01 LeslieCarr: unpinned php-common in generic::webserver::php5
18:35 K4-713: synchronized payments cluster to r105401
17:04 apergos: restarted httpd on dataset2
15:14 apergos: thumb cleaner to bed for the night (it gets tired so easily :-P)
14:51 RobH: rebooting TS-amaranth per DaBPunkt request - unresponsive to ssh
13:16 mark: Fixed package situation on db1048
06:58 apergos: thumb cleaner starting its daily run
00:35 binasher: rebooting db1041 for new kernel to take effect.. troubleshooting xfs sync kernel panics - probably a failing raid controller
December 6
22:21 K4-713: synchronized payments cluster to r105359
19:44 Jeff_Green: temporarily doubling php5/apache2's memory limit on aluminium to see if that helps civicrm
18:51 RobH: updated bz per rt2098 with removed_comment template
18:36 RobH: applied patch to BZ per RT 2098
16:00 Jeff_Green: dist-upgrade and reboot silicon
15:05 mark: Setup puppetmaster on stafford, moved puppet DNS aliases for testing. Local git repos need to be manually 'pulled' on stafford:/var/lib/git/operations/puppet for now
15:04 mutante: killed java process on search1 and restarted lsearchd, per rainman
14:59 rainman-sr: can someone look at search1, it seems to have java process that cannot be stopped with /etc/init.d/lsearchd stop, needs root
14:51 apergos: thumb cleaner to bed for the night
13:54 logmsgbot_: hashar synchronized wmf-config/InitialiseSettings.php 'codereview: auto deferres /trunk/extensions/SemanticMediaWiki'
13:47 apergos: very temporarily gained sme space back on srv220 by clearing out temp but it will be tight again in an hour, can't see what's taking up the room besides scaler temp files but there is something
16:56 logmsgbot_: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 32757 - Enable MoodBar on Wikimedia Sverige Wiki'
16:28 logmsgbot_: reedy synchronized wmf-config/flaggedrevs.php 'Bug 32804 - mediawiki.org admins have rights to add undefined group "autochecked users"'
16:26 logmsgbot_: reedy synchronized wmf-config/flaggedrevs.php 'Bug 32804 - mediawiki.org admins have rights to add undefined group "autochecked users"'
16:07 cmjohnson1: shutting down ms2
14:31 apergos: thumbs cleaner suspended for the night
12:39 mutante: added tarin to site.pp, includes standard class, first puppet run finished
December 4
20:23 RobH: kicked apache on sockpuppet again, seems to have fixed puppet servers timing out on their runs
14:56 apergos: thumb cleaner suspended for the night
07:04 apergos: thum cleaner back at work on ms5
01:57 logmsgbot: LocalisationUpdate failed
December 3
19:11 K4-713: synchronized payments cluster to r105072
16:21 apergos: please ignore the spikes behind the ms5 curtain, just checking some space usage in a few directories we haven't looked at yet
14:57 apergos: suspending thumb cleaner
10:48 apergos: thumb cleaner back in action
10:34 apergos: thumb cleaner suspended a few minutes ago, letting things settle down a bit
10:23 apergos: test
10:20 apergos: morebots dead, tried restarting and it's still refusing to log. bad bot!
10:00 apergos: was running some checks on ms5 to see how much gain we were really getting from thumb cleanup, please ignore the little spikes in the graph kthxbye
07:01 apergos: thumb cleaner started up again
December 2
20:04 logmsgbot: preilly synchronized php-1.18/extensions/MobileFrontend/ 'weekly update to mobile frontend'
19:45 apergos: well it was a nice thought but no. suspending til tomorrowmorning
19:41 apergos: sneakig in a few more thumb deletions....
19:31 logmsgbot: awjrichards synchronized wmf-config/CommonSettings.php 'Un-disabling ContributionReporting special pages on officewiki'
19:30 logmsgbot: awjrichards synchronized wmf-config/InitialiseSettings.php 'Enabling ContributionReporting extension on officewiki'
17:51 ^demon|away: formey: dumping public svn to /svnroot in screen
16:28 apergos: cleaning up unused thumbs, slowly, on ms5: running as root in screen session there, from /root directory the command is zcat dirs-no-slashes.gz | python removeThumbDirs.py
13:29 RoanKattouw: Updated hume:/etc/cron.d/mw-update-special-pages to refer to /usr/local/bin rather than /home/wikipedia/bin . Now running the special-pages-small script by hand. Turns out the lock file creation wasn't needed after all
13:27 mutante: williams - upgrading facter,puppet,php5,.. from wmf repo
17:01 mutante: hume - installed mysql client 5.1, sym link of wikidiff2.so to php_wikidiff2.so in /usr/lib/php5/20090626, changed # to // in /etc/php5/cli/conf.d/fss.ini, changed variables_order in /etc/php5/cli/php.ini to be EGPCS
16:57 mutante: applied live hacks apergos did on fenari also on hume (see above), removed php5-wikidiff2 (php-wikidiff2 stays), touch /etc/cluster
16:13 RobH: db1025 hdd replaced
16:08 RobH: swapping failed drive on db1025 per rt2047
16:03 logmsgbot: reedy synchronizing Wikimedia installation... : Pushing ApiSandbox out to the cluster, not enabled anywhere yet
15:49 logmsgbot: catrope synchronized wmf-config/liquidthreads.php 'Fix notice about wmgLQTUserControlNamespaces being undefined'
15:46 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Group tweak for bug 32637'
15:37 logmsgbot: reedy synchronized wmf-config/CommonSettings.php 'Adding wmgBabelUseUserLanguage and making wmgBabelDefaultLevel from InitialiseSettings be respected'
21:17 logmsgbot: reedy synchronized flaggedrevs.dblist 'Bug 32591 - Enable Flagged Revisions on Latin Wikisource'
21:11 Reedy: Creating flagged revs tables on ruwikinews
21:10 Reedy: Creating flagged revs tables on siwiki
21:10 Reedy: Creating flagged revs tables on bawiki
21:10 Reedy: Creating flagged revs tables on elwikinews
21:09 Reedy: Creating flagged revs tables on vecwiki
21:09 Reedy: Creating flagged revs tables on fiwiki
21:08 Reedy: Creating flagged revs tables on ptwikibooks
21:08 Reedy: Creating flagged revs tables on lawikisource
20:52 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 32637 - Add Some user group to hindi wikipedia'
20:42 logmsgbot: awjrichards synchronized wmf-config/InitialiseSettings.php 'Actually re-enabling ContributionReporting for test, donate, foundation, and meta wikis'
20:36 mark: Puppet is currently failing on sanger because it wants to downgrade puppet; I'll fix that tomorrow
20:36 logmsgbot: awjrichards synchronizing Wikimedia installation... : Pushing ContributionReporting changes to display a friendlier 'this page temporarily disabled' message
19:50 mark: Upgraded sanger (hardy) to experimental snapshot release of Puppet 2.7.7rc2
19:26 cmjohnson1: replacing msw-c2-sdtpa
18:55 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 32636 - Change UploadNavigationUrl to new link'
18:55 notpeter: restarting apache on formey
17:32 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 32642 - Enable e-mail notifications for watchlist (EnotifWatchlist) on WMRU wiki'
15:26 RoanKattouw: Creating /var/lock/update-special-pages and /var/lock/update-special-pages-small on hume, as empty files owned by apache. Without these lock files, the update-special-pages cron jobs fail immediately
13:03 mark: Fixed broken package situation on ms1002
13:01 mark: Ran dist-upgrade on streber
11:45 logmsgbot: hashar synchronized wmf-config/codereview.php 'CodeReview: auto defer /trunk/extensions/AddThis an extension to add G+, facebook, twitter links to "like" stuff'
11:18 mark: Running reboot on streber, stuck on disk
11:07 mark: Running apt-get dist-upgrade && reboot on sockpuppet
09:47 mutante: transcode1 - installing security updates, new kernel, rebooting
09:03 mutante: same on zinc, upgrade apt itself and some libs
09:00 mutante: started ntpd and installed some upgrades on magnesium
18:43 logmsgbot: hashar: jenkins backup. The host was missing the SSL cert and apache did not start up. Fixed by Daniel Zahn https://gerrit.wikimedia.org/r/1156
17:43 mutante: gallium: installing new kernel and rebooting
17:42 logmsgbot: hashar: wrongly stopped jenkins while mutante was doing an upgrade. Jenkins process dead for some minutes :(
17:40 logmsgbot: hashar: restarting jenkins
17:15 mutante: gallium - installing some upgrades: apache2..,apt,bind9/dnsutils,,php5..,openjdk,puppet..RT 2039
22:21 mark: Upgraded puppet client to 2.7.7rc2 on formey (expecting lint breakage)
18:25 mark: Let puppet upgrade puppet master and puppet agent to 2.7.7rc2 on sockpuppet
November 25
22:45 apergos: live hacked a few things on fenari so people can use it: installed mysql client 5.1, sym link of wikidiff2.so to php_wikidiff2.so in /usr/lib/php5/20090626, changed # to // in /etc/php5/cli/conf.d/fss.ini, changed variables_order in /etc/php5/cli/php.ini to be EGPCS... this should all be puppetized for bastion hosts
18:10 mutante: secure.wm is up again, thanks Rob. (was disk fail. RT to replace hardware created)
21:21 RoanKattouw: Adding new moodbar_feedback_response table on all wikis
21:08 logmsgbot: hashar: Created symlink in php-1.18/extensions/FlaggedRevs for frontend pointing to presentation. Should fix localisation update from twn which use trunk. See r102741
16:10 logmsgbot: laner synchronized wmf-config/db.php 'Removing db19, for maintenance'
15:56 Ryan_Lane: nova upgrade is finished. there are still a few issues
15:36 RoanKattouw: Reason I did that was that I was seeing error messages like proc_open(): fork failed - Cannot allocate memory in /usr/local/apache/common-local/php-1.18/includes/parser/Tidy.php on line 174
15:35 RoanKattouw: and on srv256 too
15:32 RoanKattouw: Killing some memory-hogging sleeping Apache procs on srv253
12:38 mutante: synced InitialiseSettings to set wgBlockDisablesLogin to true for foundationwiki (RT 690)
01:14 logmsgbot: awjrichards synchronized wmf-config/InitialiseSettings.php 'Enabling ContributionReporting on metawiki'
01:05 K4-713: synchronized payments cluster to r103989
00:12 Ryan_Lane: starting gerrit
00:04 Ryan_Lane: shutting down gerrit
November 22
23:04 Tim: on kaulen: reverted the firewall rule and createaccount.cgi deny rule from yesterday, replaced the latter by temporarily disabling account creation with BZ "createemailregexp"
22:15 Ryan_Lane: started gerrit back up
22:10 Ryan_Lane: temporarily shutting down labsconsole
20:04 Ryan_Lane: upgrading nova from cactus to diablo. during this time period, instances may be occasionally inaccessible, and labsconsole may also be occasionally inaccessible
19:45 binasher: increased udp recv buffer on locke to 512M. i think lp-filter needs some work.
19:13 binasher: udp2log on emery was broken post upgrade due to files under /var/log/squid being owned by human users. fixed + restarted
17:34 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 32587 - Using localized section headings for uploads to Commons through upload-form'
17:32 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 32587 - Using localized section headings for uploads to Commons through upload-form'
15:35 mark: Ran apt-get dist-upgrade && halt on db17
12:34 mutante: sq38 - broken - giving up on it as it now hangs after RAC detection however you boot it
12:05 mutante: sq38 - Attempting PXE boot to reinstall (RT 2017)
06:40 Ryan_Lane: added a default sudo policy for labs instances. project members automatically have sudo privileges, excluding specific projects.
05:03 Tim: on bugzilla: allow reopened -> new transition to make dealing with malicious mass bug closures easier
02:54 Tim: fixed resolution and status fields of the rocketmail bugs
02:25 logmsgbot: LocalisationUpdate completed (1.18) at Tue Nov 22 02:28:06 UTC 2011
01:57 logmsgbot: LocalisationUpdate failed
01:57 K4-713: synchronized payments1, 2, and 3 to r103880
01:30 awjr: synchronizing thank_you module in CiviCRM on Grosley and Aluminium to r848
01:04 notpeter: restzaring apache on formey
00:59 awjr: synchronizing thank_you module in CiviCRM on Grosley and Aluminium to r847
00:35 Tim: blocked all account creation on bugzilla due to tor abuse
00:30 logmsgbot: asher synchronized wmf-config/CommonSettings.php 'changed wgUDPProfilerHost to professor.pmtpa.wmnet'
00:24 K4-713: Synchronized payments cluster to r103869
00:23 domas: nfs home is fucked up, 1.5s response times on simple ops
00:21 logmsgbot: midom synchronized wmf-config/db.php 'taking out db19, was out for past week or so'
00:20 Tim: on kaulen: firewalled 46.165.196.182 due to bugzilla trolling
00:12 notpeter: upgrading udp2log on emery to udplog_1.8-2 and restarting udp2log
00:01 binasher: rebooting db11 to new kernel (it's out of rotation)
November 15
23:50 awjr: synchronizing payments cluster to r103267
23:48 logmsgbot: neilk rebuilt wikiversions.cdb and synchronized wikiversions files:
23:35 logmsgbot: neilk synchronizing Wikimedia installation... : renabling multi-file select for UploadWizard on Commons, quashing some bugs with UploadStash
23:31 maplebed: imported packages for swift to our repository: swift, swift-*, python-swift, python-eventlet, python-greenlet, python-webob
21:58 notpeter: can't pop a shell on searchidx2 via ssh or ipmi rebooting
21:12 logmsgbot: catrope synchronized wmf-config/CommonSettings.php 'Comment out the account creation throttle hack for the Serbia event. Leaving it in as a comment so it can be used as a boilerplate for a next time'
19:39 awjr: updating frequency of donations queue consumption from every 5mins to 3mins in jenkins on aluminium
18:35 LeslieCarr: hostway transit moved
18:20 LeslieCarr: moving hostway transit from csw5-pmtpa to cr2-pmtpa
16:14 Jeff_Green: testing passive-checks config on nagios
15:48 mark: Converted eth0 to aggregated bond0 on sq69
15:12 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 31842 - Change of namespace names in Old Church Slavonic Wikipedia'
14:49 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 32119 - Change the diacritics in the namespaces at ro.wikinews'
14:41 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 32145 - Namespaces/Aliases for Wikipedia, Project and Portal namespace in Assamese wikipedia'
14:31 mutante: commented cronjob on locke, user nobody, that shipped aggregations to dammit.lt and got connection refused
14:07 mark: Shutting down sq75-78 for relocation
14:05 mutante: cleared exim paniclog on mchenry,lily & williams - all just temp. spamd failures, mostly from October
13:59 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 32340 - Translation of project namespace and sitename for tawikibooks'
13:51 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 32357 - Alias for project space in ko.wikiquote'
13:46 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 32370 - Enable Rollbacker, Eliminator and Interface_editors on jawiki and grant bureaucrats to add/remove these flags'
02:24 logmsgbot: LocalisationUpdate completed (1.18) at Sun Nov 13 02:27:47 UTC 2011
01:57 logmsgbot: LocalisationUpdate failed
November 12
15:56 apergos: removed some old snapshot files from ms7 (not actual snaps, but saved copies of them no longer useful), freeing up about 7T of space
15:47 logmsgbot: catrope synchronized wmf-config/CommonSettings.php 'Re-enable TitleBlacklist on kwwiktionary'
15:45 logmsgbot: catrope synchronized wmf-config/CommonSettings.php 'Temp disable TitleBlacklist on kwwiktionary'
09:44 logmsgbot: catrope synchronized wmf-config/CommonSettings.php 'Replace Botswana account creation throttle exemption with one for Serbia'
06:04 logmsgbot: neilk synchronized wmf-config/CommonSettings.php 'Goes w/ previous change -- re-enable Special:UploadWizard where it is in use (i.e. Commons)'
05:59 logmsgbot: neilk synchronized wmf-config/CommonSettings.php 'Goes w/ previous change -- also need to disable multi-file select when disabling chunks'
05:51 logmsgbot: neilk synchronized wmf-config/CommonSettings.php 'Disabling chunked uploads temporarily, according to Erik we seem to have not deployed the API that can handle them'
02:38 awjr: synchronizing payments cluster to r102843
02:28 logmsgbot: LocalisationUpdate completed (1.18) at Sat Nov 12 02:31:22 UTC 2011
02:22 logmsgbot: demon synchronized wmf-config/CommonSettings.php 'Set fallbackToAltUploadForm to true per Erik, tempoary site issues'
01:57 logmsgbot: LocalisationUpdate failed
01:20 logmsgbot: aaron synchronized wmf-config/flaggedrevs.php 'arwiki: disabled autopromote, added Portal and Annex namespaces to $wgFlaggedRevsNamespaces'
21:55 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 32312 - Enable WikiLove on Japanese Wikipedia'
21:54 Reedy: Creating Wikilove tables on jawiki
21:00 mark: Running puppetd --test in ddsh, concurrency 8
20:29 mark: Undid the sockpuppet firewall rules
20:17 mark: Revert nagios changes
19:52 logmsgbot: aaron synchronized wmf-config/InitialiseSettings.php 'removed references to patrolother right that does not exist'
19:48 logmsgbot: aaron synchronized wmf-config/InitialiseSettings.php 'removed references to autopatrolother right that does not exist'
19:36 logmsgbot: ben synchronized wmf-config/db.php 'switching all external store clusters except cluster22 (current active) from srv/ms hosts to new ES hardware es3 and es4.'
19:31 mark: Truncated several puppet db tables again
20:03 maplebed: changed replication topology for external store cluster; new ES hosts are all now replicating off es3, which is still replicating off ms3.
19:52 AaronSchulz: ran fixBug32198.php and confirmed that there are no more affected rows atm
19:49 logmsgbot: tfinc synchronized wmf-config/InitialiseSettings.php 'Upping resolution of wmf wiki logo to 35px'
19:46 logmsgbot: tfinc synchronized wmf-config/InitialiseSettings.php 'Pushing config change again due to missing ssh agent'
19:01 logmsgbot: ben synchronized wmf-config/db.php 'switching external store cluster6 from srv hosts to new ES hardware es3 and es4.'
19:00 maplebed: switching external store cluster6 to new ES hardware.
18:14 mark: Converted eth0 on sq67 to an aggregated interface of eth0-4
17:25 mark: Converted eth0 on sq68 to an aggregated interface of eth0-4
15:15 logmsgbot: catrope synchronized wmf-config/CommonSettings.php 'bug 28026 trial - Enable $wgEnotifUserTalk on all wikis, but keep $wgShowUpdatedMarker disabled where it was disabled before'
15:07 RoanKattouw: ...after fixing permissions on cache/l10n AGAIN
15:07 RoanKattouw: Ran sync-l10nupdate
12:22 mutante: arsenic - alright known kernel issue with bonding, ack
12:17 mutante: arsenic - power cycling
12:15 mutante: arsenic - not reachable via ssh, mgmt shows BUG: soft lockup - CPU#6 stuck for 61s!
17:14 awjr: Updating banner hiding configuration for CentralNotice to hide banners until 2012-01-31
15:35 apergos: wgContributionTrackingDBserver -> db1008 from db9 in reporting-setup.php, that should be the last of those
14:26 apergos: wgContributionReportingDBserver -> db1025 after info from jeff green (but it's still access denied in the same fashion)
14:19 mark: Shutdown BGP session to AS1299 on br1-knams
13:57 apergos: changed wgContributionReportingDBserver in reporting-setup.php to point to db1008 instead of db10 (following jgreen's aug 18 note in server admin log) but getting Access denied for user 'public_reporting'. Jeff can you have a look?
13:26 logmsgbot: hashar: contribution reporting DB is unreacheable: db10
The ContributionReporting extension still reference db10 as a database server (db: civicrm)
09:41 Tim: stopping xinetd, puppet and extdist crontab on fenari to complete ED cleanup
16:44 maplebed: purged binary logs from >14d ago on db9 to free up disk space, set expire_logs_days to 14.
14:08 mark: Labs public NAT ip firewalled to tcp port 25 on lily
12:46 mark: Firewalled labs IP addresses to tcp port 25 on mchenry.
11:40 logmsgbot: catrope synchronized wmf-config/InitialiseSettings.php 'Enable ArticleFeedback dashboard on meta'
04:44 Ryan_Lane: added npm, nodejs, and etherpad-lite to the lucid-wikimedia apt repo
02:51 Tim: stopping puppet and xinetd and disabling extdist crontab on fenari to update extensions directory
02:32 Tim: on db10: removed the deleted relay logs from mysqld-relay-bin.index with a text editor, then ran "reset slave" and "change master" to get replication working again
02:26 logmsgbot: LocalisationUpdate completed (1.18) at Fri Nov 4 02:29:26 UTC 2011
02:16 Tim: db10 ran out of disk space on /a with 27GB of relay logs created in the last 2 hours. It wouldn't respond to "stop slave" or shut down gracefully, so I had to delete the relay logs manually to free up enough space for a graceful shutdown
02:07 Tim: fixing db10, is out of disk space
02:00 awjr: Enabling ParserFunctions string functions on donatewiki (config r2566 and r2567)
02:18 logmsgbot: LocalisationUpdate completed (1.18) at Wed Nov 2 02:21:36 UTC 2011
November 1
21:36 awjr: Deployed thank_you module fixes to CiviCRM on grosley r696
21:21 Jeff_Green: changed donate.wikimedia.org from CNAME to A record so we can use MX records
21:13 awjr: Configured DonationInterface to only load i18n files on the cluster, enabled on donatewiki and foundation (CommonSettings @ r2560, InitialiseSettings @ r2561)
21:10 awjr: Running scap to pick up new DonationInterface set up for the cluster and new message files (r101527)
19:55 awjr: Fixed borked config change in CommonSettings for LandingCheck, r2559
19:53 awjr: Config change to CommonSettings to make LandingCheck always point to foundationwiki for now r2558
19:45 binasher: note, locke and emery changes only made to the running kernel for testing, not in /etc/sysctl*
19:43 binasher: made the same kernel changes on emery as locke, udp packet loss dropped from 6% to 0
19:32 awjr: deploying LandingCheck updates (@ r101515) via sync-file as well as config change to CommonSettings.php (@ r2557)
18:52 notpeter: restarting lighttpd on lily
18:43 notpeter: restarting lighttpd on lily
18:30 binasher: locke: disabled bbu checks, set per5 to writeback caching
18:15 binasher: increased def recv buf on locke to 128M (just experimenting)
18:10 RobH: mgmt fixed on mw1027. mw1033, mw1038, mw1039
18:03 Ryan_Lane: re-enabling LDAP plugin on labsconsole
18:02 Ryan_Lane: temporarily disabling LDAP plugin on labsconsole to give admin rights to Andrew Bogott (stupid bugs)
17:49 binasher: on locke: set rmem_default to 64M and rmem_max to 128M, retarted udp2log. packet loss has stopped per /proc/net/udp which is definitive vs. seq numbers
17:21 RobH: mw1018 mw1021 firmware updated, mgmt working
03:16 logmsgbot: LocalisationUpdate completed (1.18) at Thu Oct 27 03:19:13 UTC 2011
03:13 Tim: running iperf tests on eqiad internal network to get baseline performance
03:07 Tim: taking mw1071 and mw1072 for udp2log testing
02:45 logmsgbot: LocalisationUpdate completed (1.18) at Thu Oct 27 02:47:47 UTC 2011
02:20 logmsgbot: LocalisationUpdate completed (1.18) at Thu Oct 27 02:22:56 UTC 2011
00:34 logmsgbot: neilk synchronizing Wikimedia installation... : MFT @ r99799, to ensure that UploadWizard multi-file select is off by default. Was accidentally on by default in last push, and it still needs work.
18:29 logmsgbot: catrope synchronized wmf-config/InitialiseSettings.php 'Enable AFT on meta properly. Set AFTNamespaces back to the default, and configure the proper category'
14:26 ^demon: updated users.php on svn.wm.o from svn copy. this should go in puppet anyway.
14:19 logmsgbot: catrope synchronized php-1.18/includes/OutputPage.php 'Shut up PHP notice'
09:25 mutante: nagios was broken due to services for host knsq14 while the host did not exist anymore, fixed by purge-nagios-resources.py which is called by the init script
22:06 Tim: rebooting hume, it went OOM and stopped responding
21:02 awjr: synced civicrm installs on grosley and aluminium to r671 of the wikimedia repo
20:13 LeslieCarr: updating dns
18:37 LeslieCarr: puppetizing mw1001-1160 - please ignore all alerts about them for now
17:52 logmsgbot: catrope synchronized wmf-config/InitialiseSettings.php 'Reenable ClickTracking UDP logger, now pointing to emery'
16:52 logmsgbot: catrope synchronized wmf-config/InitialiseSettings.php 'Disable ClickTracking UDP logging to offload locke. Collector had already been killed'
16:49 RoanKattouw: Killed AFT log collector on locke. Process was a direct child of init (PID 1), command line was /usr/bin/udp2log --config-file=/etc/udp2log/aft -p 842
12:18 mark: All eBGP peerings of br1-knams are off now, no more packet loss
12:10 mark: Reenabled some eBGP peerings of br1-knams as cr1-esams transit couldn't handle it alone
12:00 mark: Shutting down all BGP peerings of br1-knams
11:15 mutante: mw64 - updated APT and libpam, syslog has "nrpe invoked oom-killer","Out of memory: kill process ..(apache2)" etc.,check "free swap" in Ganglia
21:57 LeslieCarr: reformatting all of the wm****.eqiad machines
21:54 Ryan_Lane: turning off ci2
19:41 cmjohnson1: resetting storage1 to access bios and check setup
19:20 logmsgbot: aaron synchronized wmf-config/flaggedrevs.php 'pre-emptive wg(Add/Remove)Groups settings to account for r100636; should have no visible effect'
02:00 Tim: disabled CheckUser purge cron job due to excessive replication lag
01:40 Tim: on hume: installed CheckUser purging cron job
01:36 Tim: made a shell script to run extensions/CheckUser/maintenance/purgeOldData.php on all wikis, for use in a cron job. Running it on hume to test it.
20:14 logmsgbot: demon synchronized php-1.18/extensions/Contest/includes/ContestContestant.php 'Push r100446. Shame on Reedy for deploying then going afk'
20:08 Jeff_Green: taking payments4 out of pybal pool
18:18 logmsgbot: preilly synchronized php-1.18/extensions/MobileFrontend/MobileFrontend.php 'critical update to mobile frontend to support iPhone application'
17:35 logmsgbot: preilly synchronized php-1.18/extensions/MobileFrontend/MobileFrontend.php 'critical update to mobile frontend to support iPhone application'
23:32 logmsgbot: reedy synchronized wmf-config/CommonSettings.php 'Set Contest Email Sender Address/Name, still needs r100292 merging and pushing for this to take affect'
23:15 logmsgbot: reedy synchronized php-1.18/extensions/Contest/includes/Contest.class.php 'Comment out submission_count default to stop it overwriting the database value on edit'
16:45 Jeff_Green: switched db1025 to slave off of db1008 as temporary middle-master (db9-->db1008-->db1025)
13:48 Jeff_Green: grosley is back online, memory upgrade postponed due to mismatched RAM
13:32 Jeff_Green: grosley shut down for RAM upgrade
02:18 logmsgbot: LocalisationUpdate completed (1.18) at Tue Oct 18 02:21:02 UTC 2011
October 17
21:26 logmsgbot: awjrichards synchronized wmf-config/CommonSettings.php 'Updating list of priority countries for landing check'
20:47 rainman-sr: moved around things a bit on the search cluster, search7 seems to have been hitting I/O quite a bit. Relocated eswiki to search14, and "other namespaces" index slices for rest of searchpool3 wikis to search15
20:35 logmsgbot: awjrichards synchronized wmf-config/InitialiseSettings.php 'donate.wikipedia.org -> donate.wikimedia.org in wgServers'
20:25 logmsgbot: ariel synchronized wmf-config/lucene.php '...and remove from existing pool, oops'
18:38 apergos: tried increasing ulimit for open files from 8192 to 12288 for lsearch on search7, see /etc/init.d/lsearchd. (lots of "too many open files" in the logs) ... if this kills it please lower it again
18:08 preilly: done pushing critical MF fixes
18:07 logmsgbot: preilly synchronized php-1.18/extensions/MobileFrontend/views/layout/application.html.php 'critical update to mobile frontend'
18:01 logmsgbot: preilly synchronized php-1.18/extensions/MobileFrontend/MobileFrontend.php 'critical update to mobile frontend'
17:51 logmsgbot: preilly synchronized php-1.18/extensions/MobileFrontend/MobileFrontend.php 'critical update to mobile frontend'
17:49 apergos: restarted lsearchd on search7 again
17:49 preilly: push critical fix for MF
17:49 logmsgbot: preilly synchronized php-1.18/extensions/MobileFrontend/views/layout/_search_webkit.html.php 'critical update to mobile frontend'
17:48 logmsgbot: preilly synchronized php-1.18/extensions/MobileFrontend/MobileFrontend.php 'critical update to mobile frontend'
15:41 mark: restarted lsearchd on search7
05:32 Tim: restarted lsearchd on search7
04:21 notpeter: restarted nagios after a chmod of /etc/nagios/puppet_checks.d
02:22 logmsgbot: LocalisationUpdate completed (1.18) at Mon Oct 17 02:25:29 UTC 2011
18:27 RobH: nikerabbit now has shell deploy access via rt 1475
17:29 RobH: reviewed exim panic log on lily, williams, and mchenry for non zero size notifications. all errors were quite old (week or more) so they have been cleared out to stop the log spamming emails
22:40 logmsgbot: tstarling synchronized wmf-config/InitialiseSettings.php 're-enabling ArticleFeedback on the other wikis where it was temporarily disabled'
21:44 Tim: running PrefSwitch-addusertext.sql on hiwiki, huwiki, ptwiki, ptwikibooks
21:36 Ryan_Lane: adding wmflabs.* and wikimedialabs.* domains to DNS
21:28 logmsgbot: tstarling synchronized wmf-config/InitialiseSettings.php 're-enabling article feedback for enwiki'
21:13 logmsgbot: asher synchronized php-1.18/includes/job/JobQueue.php 'temporarily removing select for update'
20:56 Jeff_Green: moved donate wiki VirtualHost from remnants.conf to main.conf so it can catch donate.wikipedia.org
19:55 logmsgbot: py synchronized wmf-config/mc.php 'swapping two mc instances for LAST TWO upgrades to lucid'
19:17 logmsgbot: tstarling synchronized wmf-config/InitialiseSettings.php 'disabling ArticleFeedback because it is generating SQL fatal errors due to broken SimpleSurvey'
19:17 Reedy: Created WikiLove tables on all wikis that require it
19:10 logmsgbot: reedy synchronized php-1.18/includes/UserMailer.php 'r99646 Hopefully reduce number of enotif jobs by not inserting them in the first place if they're not going to send anything'
18:55 Reedy: Created wikilove tables on nowiki
18:38 Tim: running various tests against en.wikipedia.org to try to find rogue apaches and/or "can't connect to localhost" db errors
17:52 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 31295 - Define transwiki import source for kmwikt (Khmer Wiktionary) from en, fr, th, and lo'
17:45 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 31372 - Install Narayam in all Sinhala wiki projects'
17:41 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 31525 - Change logo on sg.wikipedia'
17:23 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 31553 - Enable preference to send notification emails for watchlists on the WMUK wiki'
17:19 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'For bug 31539 Swap some urls to relative'
17:12 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Enable ClickTracking as prequisite for AF on mlwiki'
16:37 RobH: sq31 reinstalling, had dead drives replaced
16:30 logmsgbot: py synchronized wmf-config/mc.php 'swapping two mc instances for upgrade to lucid'
16:29 RobH: sq31 is being poked at
15:39 notpeter: also removed srv192 and srv249 from bits lb pool for upgrades to lucid
15:36 notpeter: restarted nagios
05:47 logmsgbot: reedy synchronized wmf-config/CommonSettings.php 'Bug 31634 - qqq extra language missing for Translate on Meta'
02:31 Jeff_Green: tweaked remnants.conf and redirects.conf, running sync-apache and apache-graceful-all
02:14 logmsgbot: LocalisationUpdate completed (1.18) at Wed Oct 12 02:17:03 UTC 2011
22:12 logmsgbot: neilk synchronized php-1.18/extensions/UploadWizard/resources/mw.UploadWizardLicenseInput.js 'fix issue with UploadWizard next button on licensing step and jquery 1.6.4'
19:13 RobH: updated netboot.cfg for cp1001-1042 on brewster
19:12 RobH: installing os on cp1001-cp1042 per RT 1679
18:25 logmsgbot: aaron rebuilt wikiversions.cdb and synchronized wikiversions files: test2wiki to 1.18
16:12 RobH: mw59-mw74 OS installed, awaiting puppetization
15:47 RobH: installing os on mw59-mw75
14:58 mark: Merged lots of puppet changes, gonna hide now
14:03 RobH: updated dns for bd.wikimedia.org, project still not live
13:52 notpeter: stopping apache and job queue runners on all external store boxes
13:15 notpeter: powercycling srv266, as this is one of our favorite pastimes.
12:21 logmsgbot: catrope synchronized wmf-config/CommonSettings.php 'Replace expired account creation throttle exemption for an India event with one for a Brazil event'
19:21 RobH: set payments3 and payments4 to false in pybal, depooling them per request of jeff via arthur
19:19 RobH: mw1111 & mw1122 fixed
19:15 logmsgbot: py synchronized wmf-config/mc.php 'pushing new mc.php to swap 2 more mc instances'
19:01 mutante: db12 - fix date/NTP
18:58 logmsgbot: py synchronized wmf-config/mc.php 'pushing new mc.php to swap 2 more mc instances'
18:56 RobH: mw1120 fixed
18:52 RobH: mw1097 & mw1112 fixed, os installing
18:49 logmsgbot: catrope synchronized php-1.18/extensions/LiquidThreads/lqt.js 'Actually sync the right file for r99024'
18:48 logmsgbot: preilly synchronized php-1.18/extensions/MobileFrontend/stylesheets/iphone.css 'update to mobile frontend emergency fix for search bar width'
18:48 preilly: pushing style fix for new search bar width
18:48 logmsgbot: preilly synchronized php-1.18/extensions/MobileFrontend/stylesheets/iphone2.css 'update to mobile frontend emergency fix for search bar width'
18:47 logmsgbot: preilly synchronized php/extensions/MobileFrontend/stylesheets/iphone.css 'update to mobile frontend emergency fix for search bar width'
18:47 logmsgbot: preilly synchronized php/extensions/MobileFrontend/stylesheets/iphone2.css 'update to mobile frontend emergency fix for search bar width'
18:46 logmsgbot: preilly synchronized php-1.18/extensions/MobileFrontend/stylesheets/iphone2.css 'update to mobile frontend emergency fix for search bar width'
18:46 logmsgbot: preilly synchronized php-1.18/extensions/MobileFrontend/stylesheets/iphone.css 'update to mobile frontend emergency fix for search bar width'
18:44 notpeter: removing srv190, srv194-srv213, and srv225 from apaches pool for upgrades
18:25 logmsgbot: preilly synchronized php-1.18/extensions/MobileFrontend/views/layout/_search_webkit.html.php 'update to mobile frontend emergency fix for search bar width'
18:24 mutante: sq48 is running but offline - cable connected? eth0: <NO-CARRIER
18:24 preilly: push fix for search bar width
18:24 logmsgbot: preilly synchronized php/extensions/MobileFrontend/views/layout/_search_webkit.html.php 'update to mobile frontend emergency fix for search bar width'
18:21 notpeter: putting srv226-srv242 back into apaches pool
18:07 notpeter: putting srv243-srv247 back into apaches pool
17:26 RobH: cleared errors on mw1067 and mw1070, power was loose
17:15 RobH: mw1019 repaired, os install in progress
17:10 RobH: mw1002 & mw1016 repaired, os install in progress
16:24 Jeff_Green: swinging payments.wikimedia.org DNS from 208.80.152.7
15:57 logmsgbot: reedy synchronizing Wikimedia installation... : Seems we might have a few files out of sync
14:46 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 31371 - Assamese Vs Latin numerals in templates using parser functions, Set $wgTranslateNumerals = false; for aswiki'
14:42 mark: Allocated LVS service IPs for mobile-lb.pmtpa, and payments-lb.(pmtpa|eqiad)
21:21 robh: squid instances crashed out with disk swapping, restarted instances on sq31
21:19 robh: sq31 raid rebuilt
21:14 robh: manually started raid rebuild on sq32, its looking good and is in progress
20:53 notpeter: restarted gammu-smsd. it seems to be sending pages now. just in time for the 1.18 deploy! you're welcome ;)
20:46 notpeter: editing spence:/etc/gammurc and spence:/etc/gammu-smsdrc to point to /dev/ttyUSB1. because that's where it seems the sms sending stick decided to mount....
18:43 Jeff_Green: dist-upgrade and rebooting grosley
18:17 logmsgbot: reedy synchronized wmf-config/CommonSettings.php 'Remove old yaseo references'
17:44 preilly: need to push fix for double ? in view link
17:44 logmsgbot: preilly synchronized php/extensions/MobileFrontend/MobileFrontend.php 'update to mobile frontend emergency fix to querystring'
17:43 logmsgbot: preilly synchronized php-1.18/extensions/MobileFrontend/MobileFrontend.php 'update to mobile frontend emergency fix to querystring'
17:37 logmsgbot: preilly synchronized php-1.18/extensions/MobileFrontend/MobileFrontend.php 'update to mobile frontend emergency fix to querystring'
17:36 preilly: fix issue with mobile frontend view on regular link
17:36 logmsgbot: preilly synchronized php/extensions/MobileFrontend/MobileFrontend.php 'update to mobile frontend emergency fix to querystring'
17:06 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Update NS 105 for fawikisource'
17:01 notpeter: restarting and upgrading to lucid srv226-247
16:58 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 30392 - zh-yue.wikipedia should use English namespace names as well as Wikipedia/Wikipedia_talk for NS_PROJECT[_TALK]'
16:44 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Update NS 105 for fawikisource'
16:43 notpeter: stopping all services (memcache, job-runner,apache, and puppet) on srv226-247
16:12 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Add index NS to fawikisource'
16:06 logmsgbot: py synchronized wmf-config/mc.php 'new mc.php to swap 1 more memecache instances for lucid upgrades'
15:41 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 30721 - add namespace alias on latin wikisource (la.ws)'
15:38 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Set more mznwiki aliases'
12:43 notpeter: gracefulling apache on srv155 to load new php ini's
02:41 logmsgbot: tstarling synchronized wmf-config/InitialiseSettings.php 're-enabled Collection on eswiki'
02:39 Tim: killed mw-qserve on pdf1 and pdf3 and let supervise restart it. Apparently the queue size counters drifted and they thought they were busy when they weren't.
02:30 Tim: killed some stuck mw-zip processes on pdf1 and pdf3
02:29 logmsgbot: LocalisationUpdate completed (1.18) at Tue Oct 4 02:31:47 UTC 2011
02:26 logmsgbot: LocalisationUpdate completed (1.17) at Tue Oct 4 02:28:59 UTC 2011
02:24 Tim: killed some stuck imagemagick processes on pdf1, some were running since 2010!
02:20 Tim: restarted gmond on pdf2, it was probably screwed up because of the presence of the ganglia-monitor package
01:21 logmsgbot: tstarling synchronized wmf-config/InitialiseSettings.php 'disabling collection extension on eswiki due to DoS'
01:14 logmsgbot: tstarling synchronized php-1.17/extensions/Collection/Collection.body.php 'patch to send XFF header'
01:13 logmsgbot: tstarling synchronized php-1.18/extensions/Collection/Collection.body.php 'patch to send XFF header'
01:12 mutante: sq36,sq37,sq38 - upgrade kernel to 2.6.32-34/dist-upgrade,reboot (RT #1612)
00:49 mutante: sq33,sq34,sq35 - upgrade kernel to 2.6.32-34/dist-upgrade,reboot (RT #1612)
00:39 maplebed: powercycled pdf2 because it wouldn't respond to ssh
October 3
23:59 maplebed: added ganglios to lucid apt repo
23:03 logmsgbot: aaron rebuilt and synchronized wikiversions files: Moving enwikibooks to 1.18
22:42 AaronSchulz: Added sync-dir script, modified sync-common-file a bit to handle it
02:19 logmsgbot: LocalisationUpdate completed (1.18) at Sat Oct 1 02:21:50 UTC 2011
02:17 logmsgbot: LocalisationUpdate completed (1.17) at Sat Oct 1 02:19:37 UTC 2011
00:05 logmsgbot: catrope synchronized wmf-config/CommonSettings.php 'Use interwiki-pr.cdb for wikis with $wmgHTTPSExperiment (right now that means all wikis)'
20:28 RoanKattouw: Killing three really long-running FlaggedRevs queries on db25, its replag is insane
20:11 notpeter: taking down srv278 and srv258-srv272 for upgrades to lucid (not doing upgrades yet, am going to see if thigns are stable without them first)
19:57 logmsgbot: catrope synchronized php-1.18/includes/parser/ParserCache.php 'Bump pcache number for action=parse hack'
19:57 logmsgbot: catrope synchronized php-1.17/includes/parser/ParserCache.php 'Bump pcache number for action=parse hack'
19:52 logmsgbot: catrope synchronized wmf-config/secure.php 'Apply article path hack in GetLocalURL too, needed for API parse hack'
19:26 logmsgbot: py synchronized wmf-config/mc.php 'pushing out new mc.php to rotate out 1 more of a coming many boxes for upgrade to lucid'
19:05 Ryan_Lane: making DNS changes for wikiquote to point to wikiquote-lb instead of text; includes all langlist cnames
18:51 logmsgbot: py synchronized wmf-config/mc.php 'pushing out new mc.php to rotate out 3 more of a coming many boxes for upgrade to lucid'
18:23 Ryan_Lane: making DNS changes for wikibooks to point to wikibooks-lb instead of text; includes all langlist cnames
18:13 logmsgbot: py synchronized wmf-config/mc.php 'pushing out new mc.php to rotate out 2 more of a coming many boxes for upgrade to lucid'
17:59 Ryan_Lane: making DNS changes for wikisource to point to wikisource-lb instead of text; includes all langlist cnames
17:58 notpeter: putting mw28-49 and mw51-58 into the apaches pool
17:15 logmsgbot: py synchronized wmf-config/mc.php 'pushing out new mc.php to rotate out 2 more of a coming many boxes for upgrade to lucid'
16:28 logmsgbot: py synchronized wmf-config/mc.php 'pushing out new mc.php to rotate out 2 more of a coming many boxes for upgrade to lucid'
16:27 Ryan_Lane: making a *lot* of DNS changes. Please don't push DNS changes right now, if you know what's good for you.
16:19 Ryan_Lane: pooling ssl3004
16:14 Ryan_Lane: changed aggregators for ssl_esams from maerlant to ssl3001 and ssl3002 in gmetad
16:13 Ryan_Lane: pooling ssl3003
16:08 Ryan_Lane: pooling ssl3002
16:06 Ryan_Lane: installing new wikimedia-task-dns-auth package on dobson
16:03 Ryan_Lane: pooled ssl3001, depooling maerlant
16:02 Ryan_Lane: ipv6and4.labs moved to ssl3001
15:50 robh: ssl3002-ssl3004 os installed
15:33 Ryan_Lane: restarting pybal on amslvs1
15:29 RoanKattouw: Clearing /tmp directories on all mediawiki-installation hosts of files that haven't been accessed in more than 2 hours and are not in a subdirectory (this protects mw-cache files)
15:26 robh: ssl3002-ssl3004 installing
15:25 robh: cleaned up netboot.cfg (removing entries for servers that dont exist anymore), added in ssl esams range, updated dhcp
15:25 Ryan_Lane: restarting pybal on amslvs2
15:24 Ryan_Lane: restarting pybal on amslvs4
15:22 robh: ssl3001 installed with os, ready for puppet deployment
15:18 RoanKattouw: Using dsh to remove /tmp/mw-cache from all mediawiki-installation boxes. It's an obsolete caching dir with 300-400MB of dead data on each server
15:14 Ryan_Lane: restarting pybal on amslvs3
15:12 RoanKattouw: Removed all /tmp files that hadn't been accessed in >48 hours on srv255
15:05 Ryan_Lane: make that repository
15:05 Ryan_Lane: adding new wikimedia-task-dns-auth to repositoru
15:03 logmsgbot: py synchronized wmf-config/mc.php 'pushing out new mc.php to rotate out 2 more of a coming many boxes for upgrade to lucid'
14:51 Ryan_Lane: updating dns, adding mediawiki-lb using text's IP
14:50 robh: updating dns with ssl3001-ssl3004
14:32 logmsgbot: py synchronized wmf-config/mc.php 'pushing out new mc.php to rotate out 2 of a coming many boxes for upgrade to lucid'
13:55 mark: Configured csw1-esams switchports for ssl3001-3006
12:37 mark: shutdown server iris for decommissioning
11:36 RoanKattouw: Edited /home/wikipedia/bin/sync-common-file and added -o -oSetupTimeout=10 to the dsh command line. This makes sync-file not hang forever when an Apache is down. Thanks to maplebed for the tip
09:56 logmsgbot: catrope synchronized wmf-config/CommonSettings.php 'Disable interwiki-pr.cdb, again. I swear I disabled this before but I guess it came back'
09:52 logmsgbot: catrope synchronized php-1.17/includes/parser/Parser.php 'Actually deploy r98197, grr, forgot to ran svn up before scapping last night'
00:12 logmsgbot: ben synchronized wmf-config/db.php 'retabbed the db.php; no content changes.'
00:00 logmsgbot: ben synchronized wmf-config/db.php 'putting two apache hosts into rotation for cluster8 since they're up, happy, and have correct content.'
22:35 logmsgbot: asher synchronized wmf-config/db.php 'setting s5 to ro for master swap'
22:32 binasher: setting s5 to ro, swapping master
22:20 logmsgbot: ben synchronized wmf-config/db.php 'removing ms1 from rotation again until its mysql configs get fixed. too much system cpu and connection timeouts'
21:47 logmsgbot: asher synchronized wmf-config/db.php 'setting s6 to ro for master switch'
21:46 logmsgbot: catrope synchronizing Wikimedia installation... : Reinstate MobileFrontend changes that I reverted while debugging the broken Nagios check for mobile
21:45 binasher: setting db cluster S6 to read only for master rotation
21:37 logmsgbot: catrope synchronized wmf-config/CommonSettings.php 'Revert r2352: make $wgRightsUrl and $wgRightsIcon protocol-relative again. Should be fine with the WikimediaMessages fix'
21:21 notpeter: pushing new dns templates after a typo correction
21:08 RoanKattouw: So to set the record straight, I removed $urlprotocol from $wgRights* and that may or may not have been what fixed the copyright issue
21:05 RoanKattouw: I meant remove $urlprotocol
21:05 logmsgbot: catrope synchronized wmf-config/CommonSettings.php 'Remove from * because I don't know what else to try'
20:14 Jeff_Green: rebooting payments4 for apt dist-upgrade fun
20:07 Ryan_Lane: restarting gerrit. its error log stopped logging
19:50 logmsgbot: catrope synchronized wmf-config/InitialiseSettings.php 'Here goes... Set $wgServer protocol-relative on all wikis, and enable $wmgHTTPSExperiment on all wikis'
19:46 logmsgbot: catrope synchronized wmf-config/CommonSettings.php 'Use $urlprotocol in two more places'
19:44 logmsgbot: catrope synchronized wmf-config/CommonSettings.php 'When $wmgHTTPSExperiment is used, do NOT use protocol-relative URLs for en.wikipedia.org; that domain does not support https right now'
19:38 logmsgbot: catrope synchronized wmf-config/InitialiseSettings.php 'Make most of InitialiseSettings protocol-relative. Left out domains that do not yet have HTTPS, and wgServer'
19:14 logmsgbot: catrope synchronized wmf-config/InitialiseSettings.php 'Kill prstdlogo and make stdlogo protocol-relative'
19:14 logmsgbot: catrope synchronized wmf-config/CommonSettings.php 'Make all logo URLs protocol-relative'
18:36 logmsgbot: asher synchronized wmf-config/db.php 'commenting out db39'
18:34 binasher: new s7 master position db16-bin.000221 515053560
16:17 notpeter: adding mw1, mw2, and mw3 to apaches pool for testing.
16:11 logmsgbot: py synchronizing Wikimedia installation... : scappin' to make sure that mw1-27 boxxies are up to date before I put them into apaches pool
15:38 notpeter: moving dsh group job-runners2 to root's home dir on fenari, as it seems out of date. if this breaks anything, that's where it can be found.
15:38 mark: Configured switchports for ssl1-4
13:29 RoanKattouw: Fixed ownership&permissions for /h/w/common/*.dblist , they were a mess and caused sync-dblist to fail for non-roots. Ran chgrp wikidev *.dblist; chmod g+w *.dblist;
10:01 RoanKattouw: Updating prototype.wikimedia.org/rc-en to 1.18wmf1
03:02 Tim: on srv182: restarted memcached to reduce memory usage, and started apache and puppet
03:01 logmsgbot: tstarling synchronized wmf-config/mc.php 'removed srv182 from memcached since it only has 4GB, added srv237 instead'
02:23 logmsgbot: LocalisationUpdate completed (1.18) at Mon Sep 26 02:26:09 UTC 2011
02:21 logmsgbot: LocalisationUpdate completed (1.17) at Mon Sep 26 02:23:39 UTC 2011
07:30 apergos: yet another search7 lsearchd restart. I did not see one cpu pegged, in theory the box would not have been depooled, right? alas I was not awake enough to remember to check that
02:24 logmsgbot: LocalisationUpdate completed (1.18) at Sun Sep 25 02:26:52 UTC 2011
02:21 logmsgbot: LocalisationUpdate completed (1.17) at Sun Sep 25 02:24:17 UTC 2011
September 24
19:09 Ryan_Lane: adding the python-rtkit package to the lucid wikimedia repo
18:38 logmsgbot: aaron rebuilt and synchronized wikiversions files... : testing message; no changes
17:08 mark: Enabled LLDP on all Foundry access switches (asw-a4-sdtpa, asw-a5-sdtpa, asw-d1-sdtpa, asw-d2-sdtpa, asw-d3-sdtpa)
16:41 robh: updated blog to add fundraising banner
15:37 robh: mw6 os installed, ready for puppet/service
15:25 robh: did some tinkering on blog themes, any odd issues in the past 5 minutes with blog appearance are due to that. all changes reverted and blog is back to normal
15:17 robh: mw5 os installed, ready for puppet/service
15:04 mark: Setup firewall filter on cr1-sdtpa:ae0.105 and cr2-pmtpa:irb.105; reactivated rpf-check
15:03 robh: mw2 installed, ready for puppet/service
15:01 Jeff_Green: depooling payments4 for testing fun
14:59 robh: shutting down dataset1 for mainboard swap, confirmed with ariel the other day that its not serving live traffic
14:51 robh: mw13 installed, ready for puppetization and service
08:22 logmsgbot: ariel synchronized php-1.17/maintenance/dumpTextPass.php 'mwscript wrapper for script call'
08:20 logmsgbot: ariel synchronized php-1.18/maintenance/dumpTextPass.php 'mwscript wrapper for script call'
02:25 logmsgbot: LocalisationUpdate completed (1.18) at Fri Sep 23 02:27:39 UTC 2011
02:22 logmsgbot: LocalisationUpdate completed (1.17) at Fri Sep 23 02:25:11 UTC 2011
01:07 logmsgbot: preilly synchronized php/extensions/MobileFrontend/MobileFrontend.php 'update mobile frontend code'
00:29 preilly: pushing TestCanonicalRedirect hook
00:29 logmsgbot: preilly synchronized php/includes/Wiki.php 'update mobile frontend code'
00:23 logmsgbot: preilly synchronized php/extensions/MobileFrontend/javascripts/application.js 'update mobile frontend code'
00:19 logmsgbot: preilly ran sync-common-all 'update mobile frontend code'
00:19 preilly: pushing today's mobile frontend push
September 22
23:07 binasher: running schema migrations against eqiad and analytics slaves
22:19 Ryan_Lane: tweeking performance settings for gerrit, restarting
20:49 notpeter: putting srv274 and srv273 back in the apaches pool
20:20 binasher: additive liquidthread 1.18 migrations completed on all slaves, still need to do masters. some slaves could use an index drop
19:58 mark: Set static routes of 10.4.0.0/24 and 208.80.153.192/32 (in addition to 208.80.153.192/28) to 10.4.16.3 on cr1-sdtpa and cr2-pmtpa
19:55 logmsgbot: ben synchronized wmf-config/db.php 'db13 is now caught up with replication. Putting it back in rotation.'
19:49 mark: Deconfigured cr1-sdtpa and cr2-pmtpa for vlan 103; they are no longer having an interface in that vlan
19:40 binasher: preparing to run liquidthreads 1.18 schema migrations against slaves
19:21 Jeff_Green: rebooting silicon to test firewall rule setup
18:16 robh: installing mw1-mw27
18:14 robh: updated netboot apaches.cfg to include a tmp partition of 2gb
17:59 logmsgbot: ben synchronized wmf-config/db.php 'pulling db13 out of s2 because it looks like the check slave status cache does not work - the host is being pummelled by show slave status checks.'
17:55 logmsgbot: ben synchronized wmf-config/db.php 'db13 back in rotation now that slaving is fixed, though it\'s still lagged,which is ok because MW will not use lagged slaves'
17:31 AaronSchulz: Hacked in common/edit.js symlink into 1.18 for LQT
17:22 robh: updating dns for new mw1-mw74
17:18 maplebed: pulled db13 out of rotation (cluster s2) because slaving is broken
17:17 logmsgbot: ben synchronized wmf-config/db.php 'slaving is broken on db13. removing it from rotation in cluster s2.'
16:38 notpeter: returning srv273 and srv274 to apaches pool
16:05 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Gendered namespaces for test2wiki'
16:00 notpeter: putting srv275 back into apaches pool and removing srv273 for upgrade to lucid
15:42 notpeter: returning srv276 to apaches and removing srv274 for upgrade to lucid
15:30 notpeter: removing srv275 from apaches pool for upgrade to lucid
15:06 logmsgbot: reedy synchronized php-1.18/includes/Linker.php 'Error handling for null given rather than a Title'
15:05 notpeter: removing srv276 from apaches pool for upgrade to lucid
15:04 Reedy: Who left the live hack in php-1.18 Linker.php type coercing public static function makeMediaLinkObj( Title $title, $text = , $time = false ) {
15:00 logmsgbot: reedy synchronized php-1.18/includes/Linker.php 'Testing for PHP fatal error in /home/wikipedia/common/php-1.18/includes/Linker.php line 831'
14:58 logmsgbot: reedy synchronized php-1.18/includes/Linker.php 'Testing for PHP fatal error in /home/wikipedia/common/php-1.18/includes/Linker.php line 831'
14:57 logmsgbot: reedy synchronized php-1.18/includes/Linker.php 'Testing for PHP fatal error in /home/wikipedia/common/php-1.18/includes/Linker.php line 831'
12:01 mutante: lvs1003's disk died - see RT #1549
11:40 mutante: power cycling lvs1003
07:41 awjr: restarting jenkins on grosley
07:27 logmsgbot: aaron synchronized php-1.18/extensions/SpamBlacklist/SpamBlacklist_body.php 'disabled wfDebugLog() call; UDP packets were way too large'
03:53 logmsgbot: tstarling synchronized wmf-config/CommonSettings.php 'fixed thumbDir in $wgForeignFileRepos, should not be used for anything but it may as well be set correctly'
02:12 Tim: cleaning up /tmp on srv215, which is almost out of disk space in its root partition
01:06 logmsgbot: asher synchronized php-1.17/includes/ExternalStoreDB.php 'bug 31052 : live hack to support reading from old non-slave external store servers'
00:07 maplebed: shutting down mysql on ms1 to replicate to eqiad. It's already out of rotation.
23:30 logmsgbot: preilly synchronized php/extensions/MobileFrontend/views/information/disable.html.php 'emergency fix for disable link endless loop'
23:29 logmsgbot: preilly synchronized php/extensions/MobileFrontend/views/information/disable.html.php 'emergency fix for disable link endless loop'
23:26 preilly: emergency fix for disable link endless loop on mobile frontend
23:21 maplebed: increased ganglia's tmpfs from 700M to 1G and restarted gmetad. it's all happy again, but we've lost ~8hrs ganglia data.
23:08 maplebed: ganglia's tmpfs is out of space.
22:55 logmsgbot: asher synchronized wmf-config/db.php 'experimenting with an externalstore cluster in readOnlyBySection'
22:50 notpeter: putting srv287 back in the apaches poolwq
22:13 notpeter: removing srv287 from apaches pool to upgrade to lucid
22:05 logmsgbot: asher synchronized php-1.17/includes/resourceloader/ResourceLoaderFileModule.php 'live hack - prevent resourceloader from serving errors when a db master is set to read-only'
21:05 binasher: invalidated varnish mobile cache due to cached pages linking to broken javascript
20:52 logmsgbot: aaron synchronized wmf-config/flaggedrevs.php 'removed duplication with delabs config; tabify some w/s'
20:18 Jeff_Green: rebuilding db1048 per ticket #1477
20:02 notpeter: putting srv288 back into apaches pool
19:55 logmsgbot: aaron synchronized wikiversions.cdb 'moving delabs to 1.18'
19:55 logmsgbot: aaron synchronized wikiversions.dat 'moving delabs to 1.18'
19:31 preilly: fix caching issue for javascript
19:30 logmsgbot: preilly synchronized php/extensions/MobileFrontend/views/layout/application.html.php 'fix javascript issue with wrong version in cache'
14:50 robh: pushed puppet update for erik z public key. stat1 old authorized keys wiped and new file in place
14:29 ^demon: if you run a 1.17 maintenance script on a 1.18 wiki, you get a fatal about not being able to find Math extension. Odd, since the check in CommonSettings.php should be sufficient (and $wmfVersionNumber is correct in saying 1.18)
13:42 Reedy: Finished creating user_former_groups table on all wikis
13:38 Reedy: Creating user_former_groups tables on all wikis
13:29 Reedy: Created user_former_groups tables on test2wiki
09:05 logmsgbot: catrope synchronized php-1.18/includes/resourceloader/ResourceLoaderContext.php 'Experimental fix for another ResourceLoader back compat issue'
09:00 logmsgbot: catrope synchronized php-1.18/resources/mediawiki/mediawiki.js 'Experimental fix for ResourceLoader back compat issue'
08:58 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Enable LQT on test2wiki'
08:53 Reedy: Created LiquidThreads tables on test2wiki
22:54 binasher: test2wiki on s3 has been migrated by hand
22:52 binasher: 1.18 schema migrations completed on all s1 slaves, starting s2
22:04 logmsgbot: aaron synchronized wmf-config/InitialiseSettings.php 'Pre-emptively disable file movers for "user" group (for 1.18, r93871)'
19:48 binasher: 1.18 schema migrations completed against enwiki on the first slave (db36)
19:04 logmsgbot: neilk synchronizing Wikimedia installation... : regular UploadWizard update, a few changes to make JS message parser throw less fatals, other misc small fixes
19:02 binasher: begin run of /home/w/common/php-1.18/maintenance/upgrade-1.18wmf1-1.php
18:39 logmsgbot: LocalisationUpdate failed
18:38 preilly: removing show/hide from blackberry for now
18:38 logmsgbot: preilly synchronized php/extensions/MobileFrontend/DeviceDetection.php 'remove show/hide for now from blackberry'
17:50 RoanKattouw-BSc: Moved /home/wikipedia/common/php-1.17-test/includes/.swp to /home/midom/php-1.17-test-includes-.swp , was causing sync errors due to restrictive perms
17:47 logmsgbot: aaron ran sync-common-all 'pushing 1.18 files out to apaches'
17:40 logmsgbot: aaron ran sync-common-all 'pushing 1.18 files out to apaches'
17:37 AaronSchulz: created php-1.18/cache/trusted-xff.cdb
17:28 AaronSchulz: Ran checkoutMediaWiki script just add php-1.18 non-svn files/symlinks (reedy checked out already)
16:54 logmsgbot: reedy synchronizing Wikimedia installation... : Deployment of 1.18 files to apaches
16:51 Reedy: Copy of php-1.18 to /home/wikipedia/common/php-1.18 finished, taking just under 46 minutes
16:03 Reedy: copying /tmp/php-1.18 to /home/wikipedia/common/php-1.18 in a screen session on fenari as reedy
02:19 logmsgbot: LocalisationUpdate completed (1.17) at Sun Sep 18 02:21:34 UTC 2011
September 17
17:43 AaronSchulz: fixed 10update script, same bug with $mwVerDbSets
13:00 domas: manually placed /etc/php5/conf.d/apc-aux.conf on lucid apaches, apparently working APC cache helps a lot with request processing
11:26 mark: PyBal on lvs1-3 was in a BGP colliding loop with cr2-pmtpa, restarted PyBal on all three hosts
10:44 domas: made profile hitrate collection less crashy, restarted it (from ~tstarling/src/udpprofile/, not sure where else it should've been)
10:44 mark: Upgrade of cr2-eqiad complete
10:34 mark: Upgrading re1.cr2-eqiad to junos 10.4R7.5
10:28 mark: Performing failover from cr2-eqiad re1 to re0
10:18 mark: Upgrade of cr1-eqiad complete
10:08 mark: Upgrading re0.cr1-eqiad to junos 10.4R7.5
09:55 mark: Upgrading re0.cr2-eqiad to junos 10.4R7.5
09:46 mark: Performing failover from cr1-eqiad re0 to re1
09:33 mark_: Upgrading re1.cr1-eqiad to junos 10.4R7.5
02:22 logmsgbot: LocalisationUpdate completed (1.17=aawiki 1.17-test) at Sat Sep 17 02:25:20 UTC 2011
00:00 notpeter: removing srv288 from the apaches pool for upgrade to lucid
September 16
23:48 maplebed: started copying data from all srv hosts with external storage databases to es1001. I expect it to run all weekend.
23:39 notpeter: putting srv289 back in the apache pool
23:12 logmsgbot: aaron synchronized wmf-config/CommonSettings.php 'added require for Math extension, checks $wmgUseMath and version compare with $wmfVersionNumber so it does nothing yet'
22:50 logmsgbot: aaron synchronized wmf-config/InitialiseSettings.php 'added wmgUseMath setting (on by default); not yet used'
21:27 AaronSchulz: fixed usage of $mwVersionNums in set-group-write2 (same double quotes bug the sync scripts had before)
20:29 AaronSchulz: fixed geshi on test2 & secure testwiki
20:25 logmsgbot: aaron ran sync-common-all
20:22 AaronSchulz: changed mwscript & mwscriptwikiset to use the /home version of scripts (people usually did this before and some things like TrustedXFF expect it)
20:07 logmsgbot: aaron synchronized wmf-config/InitialiseSettings.php 'Set wgAutopromoteOnceLogInRC for plwiki,plwikisource ahead of time - no effect yet - bug 29655'
14:53 domas: killed whoever was running full table scans with subselects on enwiki and hitting http://bugs.mysql.com/bug.php?id=46947 - thanks, you locked up enwiki for many users :)
12:26 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 30928 - Make Portal namespace for shwikipedia'
11:47 Reedy: Run namespaceDupes on hywiki
02:22 logmsgbot: LocalisationUpdate completed (1.17=aawiki 1.17-test) at Fri Sep 16 02:24:43 UTC 2011
14:17 logmsgbot: catrope synchronized php/extensions/LiquidThreads/classes/Hooks.php 'More live hacks for fatals that Ariel has been seeing in the dumps'
14:10 logmsgbot: catrope synchronized php/extensions/LiquidThreads/classes/Hooks.php 'Live hack to work around fatal'
14:08 logmsgbot: catrope synchronized php/extensions/LiquidThreads/classes/Hooks.php 'Live hack to work around fatal'
14:07 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 30832 - Set $wgOverrideSiteFeed to use Special:NewsFeed from GNSM extension, on Persian Wikinews (fawikinews)'
14:01 Reedy: Created ArticleFeedback tables on ptwiki
02:24 logmsgbot: LocalisationUpdate completed (1.17) at Tue Sep 13 02:27:04 UTC 2011
01:16 neilk_: just deployed a config change (wmf-config r2266) that should fix bug #30797
01:13 logmsgbot: neilk synchronized wmf-config/CommonSettings.php 'the URL for the UploadStash image scaler should not be protocol-relative -- this is invoked with curl, behind the scenes to fetch a scaled image.'
00:48 notpeter: putting srv223 back into rendering pool
September 12
23:37 Ryan_Lane: add CVE-2011-3192 workaround to noc.wm.o
02:53 logmsgbot: tstarling synchronized php-1.17/extensions/OggHandler/OggHandler_body.php 'live patch to fix total breakage of OggHandler due to protocol-relative upload URLs'
02:45 logmsgbot: tstarling synchronized php-1.17/includes/SquidUpdate.php 'emergency patch for bug 30792'
02:21 logmsgbot: LocalisationUpdate completed (1.17) at Wed Sep 7 02:24:09 UTC 2011
01:40 logmsgbot: asher synchronized 503.html 'running sync-file to test new wikimedia-task-appserver'
22:20 logmsgbot: ben synchronized wmf-config/mc.php 'moved mc servers that passed mctest.php from the down list to the spares list. Whitespace cleanup. Linked to memcache wikitech page. -ben'
22:13 binasher: wiped enwiki slave on db1048, re-enabled binlogging
20:06 Jeff_Green: rebooting db1008 for kernel update
19:58 logmsgbot: catrope synchronized wmf-config/InitialiseSettings.php 'Make wgUploadPath protocol-relative for commons, meta and test'
19:22 logmsgbot: catrope synchronized wmf-config/InitialiseSettings.php 'Enable HTTPS on meta'
16:21 RoanKattouw: Restarting indexer on searchidx2
16:20 RoanKattouw: Installing new build of LuceneSearch.jar on searchidx2. Contains fix for protocol-relative URLs that only applies to the indexer
15:48 logmsgbot: catrope synchronized php-fatal-error.html 'Update for r96283'
15:42 logmsgbot: catrope synchronized live-1.5/404.php 'Per bug 30733, fix HTTPS detection and make URLs protocol-relative. Grr, why is this file not versioned?'
13:30 mutante: sq31 shutdown -> RT #1431 (degraded RAID, just on one disk and squid would not run anyways)
11:34 mutante: srv266 - power cycle (too often since July)
11:25 mutante: sq76 - power up,dist upgrade/kernel,squid clean,reboot
11:12 mutante: also cleared paniclogs on mchenry (spamd socket fail on Aug31) and lily (temp. failure to write log due to disk space, see Tim's log entries Sep 2nd)
11:05 mutante: reviewd paniclog on srv207 due to non-zero size. ran out of mem once on Aug 11, cleared logs to stop email alerts
10:43 mutante: sq55 - power up,dist upgrade/kernel,squid clean,reboot
02:27 logmsgbot: LocalisationUpdate completed (1.17) at Mon Sep 5 02:29:45 UTC 2011
September 4
16:51 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'From bug 29940, kill MW/MWT namespace aliases from hiwiki. MW is an interwiki, and mwt is a languagecode'
02:27 logmsgbot: LocalisationUpdate completed (1.17) at Sun Sep 4 02:30:04 UTC 2011
September 3
02:19 logmsgbot: LocalisationUpdate completed (1.17) at Sat Sep 3 02:22:07 UTC 2011
18:17 preilly: fix a caching issue with enable/disable images links
18:10 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Synchronise wmgWikiLoveDefault with wmgUseWikiLove, seems most places want it on by default, not off by default'
17:21 Ryan_Lane: removed duplicate ip in badbadip list in squid
16:31 mutante: srv266 - power cycle
16:22 mutante: amssq38 - power up,dist-upgrade/kernel,squid clean,reboot
12:31 Tim: on db9 and db10, raised max_connect_errors from 10 to 1 billion so that it will stop blocking kaulen. These servers are not publically accessible, nobody can SYN flood them
12:25 Tim: ran "flush hosts" on db9 to temporarily fix bugzilla connection error
09:22 RoanKattouw: Reverted yesterday's changes on srv162 (disabled core dumps and suid_dumpable)
02:22 logmsgbot: LocalisationUpdate completed (1.17) at Tue Aug 30 02:25:02 UTC 2011
01:16 binasher: upgrading varnish on mobile caching servers to 3.0.0-1wmf5
00:19 binasher: added new udplog-1.7 pkg to lucid-wikimedia repo, will be upgraded everywhere via puppet on next run
August 29
22:01 awjr: updating payflowpro_gateway.i18n on payments1-4 - r95708
15:09 mutante: srv207 - was unusable due to overload/freeze - powercycle, dist-upgrade/kernel, puppet run, reboot (log entry from July 31st (RAID issues) not confirmed)
14:57 mutante: srv266 started apache
14:51 mutante: srv281 - power up, dist-upgrade/kernel, puppet run, reboot (note: see 'srv281' in comments of RT#22 and Server_admin_log)
14:39 mutante: srv278 - power up, dist-upgrade/kernel, puppet run, reboot
14:28 mutante: srv266 - power up, dist-upgrade/kernel, puppet run, reboot
14:08 mutante: srv217 - power up, dist-upgrade/kernel, puppet run, reboot
14:06 mutante: nagios-wm - ok, just needed restart to talk again
13:54 mutante: srv188 - power up, dist-upgrade/kernel, puppet run, reboot
13:45 mutante: nagios-wm is on channel but does not speak!? (not ignoring it)
13:45 mutante: srv174 - confirmed hardware failure, new RT#1379, acked in Nagios
13:29 mutante: srv156 - power up, dist-upgrade/kernel, puppet run, reboot
12:57 RoanKattouw: Reverted all of my changes to srv162 and started puppet again. Need to do more to get a core dump, will do that later
09:26 RoanKattouw: ... on srv162
09:26 RoanKattouw: Changed the core dump directory to /a/tmp/apachecore because the root partition doesn't have much free space but /a does
09:23 RoanKattouw: Set up Apache core dumping on srv162 *correctly* by uncommenting CoreDumpDirectory /tmp/apache-core locally in /etc/apache2/wmf/main.conf
09:03 RoanKattouw: Changed ownership of /mnt/upload6/math/8/0/0/800618943025315f869e4e1f09471012.png from root:root to apache:apache, permissions errors were causing PHP warnings
07:39 RoanKattouw: Reverted my changes on srv163 and started puppet
07:38 RoanKattouw: Stopped puppet on srv162, set Apache's cwd to /a/tmp/apachecore in /etc/apache2/envvars , and set ulimit -c 1000000 in /etc/default/apache2
07:34 RoanKattouw: Moving my core dump for segfault debugging test to srv162 instead of srv163, for disk space reasons
07:32 RoanKattouw: Stopped puppet on srv163 to prevent it from reverting my hacks
07:26 RoanKattouw: Restarting Apache on srv163 so these changes take effect
07:26 RoanKattouw: Enabled core dumps for Apache on srv163 by editing /etc/default/apache2
07:19 RoanKattouw: Changing Apache's cwd on srv163 by editing /etc/apache2/envvars
02:18 logmsgbot: LocalisationUpdate completed (1.17) at Mon Aug 29 02:20:17 UTC 2011
August 28
17:55 logmsgbot: ariel synchronized php-1.17/includes/upload/UploadFromStash.php 'fix fatal Call to a member function getId() on a non-object'
02:23 logmsgbot: LocalisationUpdate completed (1.17) at Sun Aug 28 02:26:07 UTC 2011
August 27
02:21 logmsgbot: LocalisationUpdate completed (1.17) at Sat Aug 27 02:23:37 UTC 2011
August 26
21:06 mutante: amssq48 - power back up, clean squid, dist-upgrade
16:37 robh: updating text-settings to move sq36 into the squid api cluster. puppet updated already for the same, and pybal updated to remove sq36 frontend from normal text service
02:27 logmsgbot: LocalisationUpdate completed (1.17) at Fri Aug 26 02:29:15 UTC 2011
00:11 robh: change reverted, nothing bad, but undesired result. hooper back to normal
00:09 robh: hooper apache config change for https redirection on etherpad
00:09 robh: i meant to paste the rt link
00:09 robh: testing something in hooper apache config, should result in nothing noticeable to users, unless i did it wrong.
00:08 maplebed: changed puppet client run interval from the default (30m) to 2hrs to reduce load on the master.
19:49 RoanKattouw: Run scap to deploy HTTPS / prot rel fixes
19:22 awjr: deployed contrib log auditing fixes on Grosley
19:20 logmsgbot: catrope synchronized wmf-config/secure.php 'Factor out URL fixing into fixupUrl, add hook functions for GetCanonicalURL and IRCLineURL'
19:12 logmsgbot: catrope synchronized wmf-config/secure.php 'Fix setting of $wgInternalServer for protocol-relative URL wikis (was broken since introduction of prot rel) and set $wgCanonicalServer'
10:45 logmsgbot: catrope synchronized wmf-config/CommonSettings.php 'Comment out $wmincClosedWikis'
10:38 logmsgbot: catrope synchronized wmf-config/InitialiseSettings.php 'Re-enable WikimediaIncubator extension so I can do some profiling'
10:37 logmsgbot: catrope synchronized wmf-config/StartProfiler.php 'Add profiling group for debugging incubatorwiki slowness, triggered by a request parameter'
10:30 logmsgbot: catrope synchronized wmf-config/InitialiseSettings.php 'Temporarily disable WikimediaIncubator extension due to slowness on incubatorwiki'
09:15 RoanKattouw: Running scap to undo the accidental deployment of yesterday's failed UploadWizard deployment
02:19 logmsgbot: LocalisationUpdate completed (1.17) at Tue Aug 23 02:21:27 UTC 2011
00:28 ^demon: ran refresh-dblist and sync-dblist
00:21 ^demon: fixed refresh-dblist since wmf-config was moved
August 22
22:39 RoanKattouw: Added UploadWizard tables to testwiki, for real this time
22:32 preilly: pushing fix for disable link
22:32 logmsgbot: preilly synchronized php/extensions/MobileFrontend/views/information/disable.html.php 'issue with en always being the target of the form'
22:21 preilly: remove content type header for now from XHTML view on Mobile Frontend
22:21 logmsgbot: preilly synchronized php/extensions/MobileFrontend/MobileFrontend.php 'remove content type header for now'
22:07 notpeter: upgrading squid and squid-frontend on sq55
21:45 notpeter: upgrading squid and squid-frontend on sq52
11:28 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 30491 - Enable WikiLove on Swedish Chapter Wiki'
07:12 Tim: on ersch and alsted: set up a temporary firewall to block external access to miscellaneous swift services such as the object servers
06:41 Tim: on alsted and ersch: adjusted rsyncd.conf to deny access from outside the cluster and restarted rsync
06:22 Tim: on ersch: reconfigured MW to send its debug log to a place which is not on the public web, and did a chmod 000 in case the config is reverted
06:18 Tim: on ersch apache2.conf: deny access to vim editor backup files
02:21 logmsgbot: LocalisationUpdate completed (1.17) at Mon Aug 22 02:23:31 UTC 2011
August 21
18:41 logmsgbot: reedy synchronized php-1.17/extensions/OAI/OAIRepo_body.php 'r95170 for bug 26304'
18:36 rainman_: restarted lucene on search7 which somehow got hung
02:17 logmsgbot: LocalisationUpdate completed (1.17) at Sun Aug 21 02:20:08 UTC 2011
August 20
11:44 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'bug 29834 - Reconfigured AF for hiwiki'
02:21 logmsgbot: LocalisationUpdate completed (1.17) at Sat Aug 20 02:23:29 UTC 2011
August 19
23:43 logmsgbot: asher synchronized wmf-config/db.php 'adding srv155 back to ext store'
13:10 Reedy: Fail summary. Bug 29834 - Enable Extension:ArticleFeedback on Hindi Wikipedia. Also, add wmg global for $wgArticleFeedbackBlacklistCategories
13:10 logmsgbot: reedy synchronized wmf-config/CommonSettings.php 'Bug 29834 - Enable Extension:ArticleFeedback on Hindi Wikipedia. Also, add wmg Global for Bug 29834 - Enable Extension:ArticleFeedback on Hindi Wikipedia'
13:09 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 29834 - Enable Extension:ArticleFeedback on Hindi Wikipedia. Also, add wmg Global for Bug 29834 - Enable Extension:ArticleFeedback on Hindi Wikipedia'
11:07 Andrew: (in a screen on hume)
11:07 Andrew: Sending 753994 emails to qualified voters in the referendum.
10:57 mark: Converted routing subinterfaces on cr2-pmtpa:ae0 to IRB and configured bridge-domains
02:19 logmsgbot: LocalisationUpdate completed (1.17) at Fri Aug 19 02:21:42 UTC 2011
01:17 Jeff_Green: flipped $wgContributionReportingDBserver back to db10
18:32 binasher: pulling srv264 from lvs for testing
18:27 logmsgbot: reedy synchronized php-1.17/includes/parser/ParserCache.php 'Add in a not saved in parser cache message if appropriate'
18:20 logmsgbot: reedy synchronized wmf-config/CommonSettings.php 'Log parsercache log groups to file'
18:18 logmsgbot: reedy synchronized php-1.17/includes/parser/ParserCache.php 'Hack in some logging for when parser output is uncacheable'
14:26 logmsgbot: catrope synchronized wmf-config/CommonSettings.php 'Keep $wgInternalServer on http when visiting wikis that do not have a protocol-relative $wgServer over https. For bug 29925'
16:23 mark: Increased prefix limit on BGP session between br1-knams and csw2-esams from 10 to 50 prefixes
16:00 RobH: lvs1004, lvs1005, lvs1006 both network patches attached
15:57 mark: Added Nagios purge script on 'reload' action in its init script
15:44 mark: Setup switchports for cross rack row connections from lvs1001-1006 on asw-a-eqiad and asw-b-eqiad
15:40 Jeff_Green: package upgrade on payments4, no reboot
14:21 mark: Noticed the mobile LVS service had a depool-threshold set at '1.5', preventing it from depooling any hosts, ever. Corrected it by setting it to .6
14:07 mark: Setup 'new-mobile' LVS service on lvs1001 and lvs1004
14:07 mark: Setup eqiad LVS cluster (classes 'high-traffic' and 'testing') consisting of lvs1001 (A4) and lvs1004 (B4); still awaiting cross-row connections
13:54 mark: Increased bgp group PyBal multihop ttl from 1 to 2 on all core routers
13:54 mark: Configured PyBal bgp group and policy statements on cr1/cr2-eqiad
13:14 RobH: reviewed exim paniclog on lily and williams, both from a refusal in spam acl, over 24 hours ago, and no issues since. cleared each paniclog back to 0
11:16 logmsgbot: catrope synchronized wmf-config/InitialiseSettings.php 'bug 30392: Add Wikipedia_talk and MediaWiki_talk aliases on zh-yuewiki'
10:42 mark: Corrected DNS for lvs1004-1006
10:03 mark: Setup switchports for lvs1001-1006
02:17 logmsgbot: LocalisationUpdate completed (1.17) at Tue Aug 16 02:19:43 UTC 2011
00:37 mutante: gmond.conf was an empty file on search12. copied it in place and restarted gmond
21:12 RoanKattouw: Ended up moving extensions/SecurePoll/auth-api.php.save to /home/catrope . I could do that because I have write access to the parent directory, even though I don't have read access to the file
21:07 RoanKattouw: Permissions for extensions/SecurePoll/auth-api.php.save are messed up, breaking scap. Running set-group-write in an attempt to fix it
17:53 mutante: added libnet-stomp-perl to puppet to be installed on spence
17:49 notpeter: upgrading squid and squid-frontend on knsq16 and knsq17
17:04 RobH: updating dns with lvs1001-1006 mgmt info
16:58 Ryan_Lane: pushing varnish changes to allow XFF from HTTPS cluster
15:39 logmsgbot: andrew synchronized php-1.17/extensions/SecurePoll/auth-api.php
15:38 logmsgbot: andrew synchronized php-1.17/extensions/SecurePoll/auth-api.php
15:36 logmsgbot: andrew synchronized php-1.17/extensions/SecurePoll/auth-api.php
15:34 logmsgbot: andrew synchronized php-1.17/extensions/SecurePoll/auth-api.php
15:25 RobH: updating dns for lvs1001-lvs1006
15:19 RobH: restarted dns on ns2
15:04 logmsgbot: andrew synchronized php-1.17/extensions/SecurePoll/includes/ballots/Ballot.php 'Deploy r94343'
15:04 logmsgbot: andrew synchronized php-1.17/extensions/SecurePoll/SecurePoll.php 'Deploy r94343'
15:03 logmsgbot: andrew synchronized php-1.17/extensions/SecurePoll/includes/ballots/RadioRangeCommentBallot.php 'Deploy r94343'
14:37 mark: Enabled NAT on the pmtpa management LAN
12:10 mark: Setup ongoing rsync of ms7 images to ms1002
12:03 mark: Setup rsyncd on ms7
02:17 logmsgbot: LocalisationUpdate completed (1.17) at Mon Aug 15 02:19:19 UTC 2011
August 14
17:49 JeLuF: rebooted srv183
17:30 JeLuF: stopped apache and jobrunner on srv183, logfiles indicate memory problems
17:18 logmsgbot: jeluf synchronized wmf-config/mc.php 'replace srv183 by srv229'
13:22 logmsgbot: catrope synchronized wmf-config/InitialiseSettings.php 'Make $wgServer protocol-relative on wikimania2005wiki, for consistency with the other HTTPS experiment wikis'
10:12 JeLuF: rebooting sq40 using echo b > /proc/sysrq-trigger, syslog shows lots of SCSI errors
09:10 apergos: stopped and restarted lsearchd on search7. had to shoot the old one, it wouldn't die
02:19 logmsgbot: LocalisationUpdate completed (1.17) at Sun Aug 14 02:21:23 UTC 2011
August 13
21:23 RoanKattouw: sq40 keeps flapping in Nagios all the time
10:39 RoanKattouw: ...on enwiki
10:39 RoanKattouw: Running a hacked-up version of extensions/ArticleFeedback/populateAFRevisions.php to fix bug 30227 (data corruption in ArticleFeedback tables), using a list of affected revisions produced on the toolserver
02:16 logmsgbot: LocalisationUpdate completed (1.17) at Sat Aug 13 02:18:42 UTC 2011
00:22 notpeter: upgrading squid and squid-frontend on knsq12 and knsq15
00:19 notpeter: upgrading squid and squid-frontend on knsq10 and knsq11
August 12
23:15 preilly: pushing fix for mobile WML view and disable caching on cookie set page
21:31 maplebed: localization update permissions changes complete and tested.
21:29 logmsgbot: LocalisationUpdate completed (1.17) at Fri Aug 12 21:31:45 UTC 2011
21:26 AaronSchulz: Running LU to confirm permission fix
21:25 notpeter: upgrading squid and squid-frontend on amssq56 and amssq58
21:17 logmsgbot: LocalisationUpdate completed (1.17) at Fri Aug 12 21:19:39 UTC 2011
21:10 maplebed: changing ownership of /usr/local/apache/common-local/php-1.17/cache/l10n from nagios to mwdeploy on all affected srv hosts
21:03 binasher: deploying new squid frontend.conf - bypass mobile redirector in case of trial opt-in/out pages
20:53 notpeter: upgrading squid and squid-frontend on amssq51 and amssq52
20:52 logmsgbot: LocalisationUpdate completed (1.17) at Fri Aug 12 20:54:15 UTC 2011
18:44 maplebed: made ariley an account on rt to submit tickets for email list maintenance
18:24 JeLuF: DNS: added bugs.mediawiki.org as alias to text.wikimedia.org
17:25 logmsgbot: jeluf synchronized wmf-config/InitialiseSettings.php '30268 - Point eowp logo to Wiki.png'
16:25 logmsgbot: mark synchronized wmf-config/CommonSettings.php 'Raise multicast ttl from 2 to 8'
16:10 logmsgbot: catrope synchronized wmf-config/CommonSettings.php 'Set $wgVaryXFPForAPI = true for HTTPS experiment wikis. This splits the API Squid cache between HTTP and HTTPS, fixing cache pollution issues'
15:08 mark: Turning off ethernet hw offloading GRO on all lvs servers with Puppet
14:30 mark: Turned off all forms of hardware segmentation on lvs4, fixing the slow upload problem
14:29 mark: Turned tcp segment offloading back on on sq51..86
13:57 mark: Manually turned off TCP segmentation offloading on sq51..86
12:59 logmsgbot: catrope synchronized wmf-config/StartProfiler.php 'Remove upload profiling, hasn't produced any useful data'
12:52 logmsgbot: catrope synchronized wmf-config/StartProfiler.php 'Profile uploads on officewiki separately, I wanna try something'
11:39 apergos: reran ppuppet by hand on spence, sq32 entries in nagios conf files were not recreated, restarted nagios, seems to be running
11:11 apergos: er... because sq32 is in the decommissioned list but the script to purge resources from nagios is broken right now, which means nagios fails to start
11:10 apergos: purged sq32 resources and host references from puppet db manually on db9, and from nagios conf files on spence. will run puppet manually on spence shortly
10:52 Andrew: (screen on hume)
10:52 Andrew: Running populatePifEditCount.php on all wikis
10:45 Andrew: Adding pif_edits table to all wikis for personal image filter voter list
10:39 mark: Enabled cr1-sdtpa:xe-0/0/2; a cross connect has been ordered, expect Nagios to complain
10:24 apergos: uncommented the monitor_group line in varnish.pp which defines the cache_mobile_eqiad group in puppet (thanks ma rk), will run puppet shortly on spence
08:44 apergos: revert change to site.pp, try applying to spence
08:29 apergos: doing repeated manual runs of puppet on spence til we catch up to current config (it is quite out of date)
07:21 apergos: nagios was failing to start because of unknown host group cache_mobile_eqiad in /etc/nagios/puppet_hosts.cfg; commented out line $nagios_group = "cache_mobile_${site}" in site.pp, waiting for puppet run to complete on spence
02:15 logmsgbot: LocalisationUpdate completed (1.17) at Fri Aug 12 02:17:38 UTC 2011
August 11
21:24 JeLuF: added 'ttf-ubuntu-font-family' to the list of required packages for image scalers in puppet (bug 30288)
21:07 JeLuF: virt2 root filesystem has switched to read-only due to a disk failure
21:03 logmsgbot: LocalisationUpdate completed (1.17) at Thu Aug 11 21:06:09 UTC 2011
20:24 binasher: re-pooling mobile2
20:16 logmsgbot: LocalisationUpdate failed
18:57 binasher: depooling mobile2 from lvs for mobile extension opt in proxy conf testing
18:19 preilly: pushing new mobile frontend changes to production
13:37 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'bug 18390 Set [bureaucrat][] = sysop on enwiki'
13:34 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'bug 30307 Allow bureaucrats to remove sysop flag on Russian Wikipedia'
12:45 mark: Migrated nas1-a and nas1-b to new 64 bit root volumes
09:45 logmsgbot: catrope synchronized wmf-config/checkers.php 'Remove broken profiling in wfACLBlocks. Not needed because everything in $wgExtensionFunctions is automatically profiled anyway'
09:17 mutante: sq75 - power cycle, squid clean
03:00 logmsgbot: LocalisationUpdate completed (1.17) at Thu Aug 11 03:02:09 UTC 2011
02:32 logmsgbot: LocalisationUpdate completed (1.17) at Thu Aug 11 02:34:20 UTC 2011
02:31 Reedy: LocalisationUpdate is giving Warning: array_diff_assoc(): Argument #1 is not an array in /home/wikipedia/common/php-1.17/extensions/LocalisationUpdate/LocalisationUpdate.class.php on line 373
01:58 logmsgbot: LocalisationUpdate completed (1.17) at Thu Aug 11 02:00:53 UTC 2011
01:56 logmsgbot: LocalisationUpdate completed (1.17) at Thu Aug 11 01:58:21 UTC 2011
01:46 logmsgbot: LocalisationUpdate completed at Thu Aug 11 01:48:34 UTC 2011
01:46 logmsgbot: LocalisationUpdate failed
01:30 logmsgbot: LocalisationUpdate failed
01:27 logmsgbot: LocalisationUpdate failed
01:12 logmsgbot: LocalisationUpdate failed
00:42 logmsgbot: demon synchronized wmf-config/StartProfiler.php 'Enabled profiling group for commonswiki'
August 10
23:02 logmsgbot: LocalisationUpdate failed
22:59 logmsgbot: LocalisationUpdate failed
22:50 aaron: testing/debugging Het Deploy updates to l10 script
21:25 preilly: pushing device change back into production
17:27 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'fix language code for bug 30045'
17:15 Ryan_Lane: RobH is removing the sep11.wikipedia.org subdomain redirect. It didn't post to the SAL, and he can't get onto IRC
13:53 mark: Fixed IPv6 router advertisements on cr1-sdtpa
11:57 mark: Killed squid on knsq30
11:52 mutante: restarted squid on knsq30
11:29 mark: Restored routing of pmtpa - esams path
11:19 mark: Restarted lsearchd on search7
02:18 logmsgbot: LocalisationUpdate completed at Mon Aug 8 02:21:05 UTC 2011
August 7
07:56 mark: Rerouted pmtpa <--> esams traffic in both directions
02:22 logmsgbot: LocalisationUpdate completed at Sun Aug 7 02:24:47 UTC 2011
August 6
15:52 mutante: amssq41 - and while we're on it: dist-upgrade incl. kernel and reboot. did not include squid package, that was downgraded on Apr 20
15:25 mutante: amssq41 - power back up after it went down, clean squid cache
13:07 RobH: puppet run on spence is via screen on root user
13:05 RobH: spence is running puppet to update with new nagios info that won't contain the facilities fasle reports. if it fails to work, ~/rob_puppet_services.cfg is on spence
13:03 RobH: nagios is being regenerated, i hope this works!
12:23 RobH: srv169 depooled, needs work and to be restored to service
12:10 logmsgbot: root synchronized wmf-config/mc.php 'replacing srv169 in active mc role due to its constant flapping'
06:46 RobH: emptied the /var/log/exim4/paniclog on project2, formey, lily, mchenry, sockpuppet, williams, srv196. All errors were quite old due to various network blips over time.
02:19 logmsgbot: LocalisationUpdate completed at Sat Aug 6 02:21:34 UTC 2011
01:49 Reedy: srv169 seems to be flapping
01:48 Reedy: Ganglia seems brokened "There was an error collecting ganglia data (127.0.0.1:8654): fsockopen error: Connection refused"
August 5
23:50 tfinc: removed knsq30 from LVS esams rotation
23:36 Reedy: Even though Asher pulled knsq30 from rotation, it seems to be still serving users
02:31 logmsgbot: aaron synchronized wmf-config/CommonSettings.php 'renamed versionNumber variables to have wmf prefix'
02:25 logmsgbot: LocalisationUpdate completed at Fri Aug 5 02:27:52 UTC 2011
00:59 notpeter: knsq30 is seems to have some kind of hardware failure. stopping squid and squid frontend proc
00:48 AaronSchulz: /home/wikipedia/bin version with a working repo again
August 4
23:57 binasher: deploying mobile squid acls with nokia and netfront enabled for a brief test to see if todays mobile gateway changes will allow it to support the load
23:10 binasher: squid mobile redirect deployment completed
22:38 binasher: mobile redirect deploy complete to all pmtpa squids. mobile cluster looks good, starting amsterdam
22:17 binasher: starting an extra slow rollout of mobile redirect acls to frontend squids, minus nokia and netfront
21:43 binasher: varnish 3.0 is now on all RoR mobile gw servers - only caching static non-article content
21:33 binasher: mobile2 is back / varnish for static content / apache relax
19:52 binasher: pulling mobile2 from lvs for testing
15:40 mark: Mounted nas1001-a:/vol/images on ms1002:/vol/images for testing, started rsync copy from ms1002:/export/upload to /vol/images
15:20 mark: Created 12-disk aggregate test0 on nas1001-a
15:01 RoanKattouw: Staging some HTTPS support / prot rel URL code on testwiki, see today's 1.17wmf1 merges. DO NOT DEPLOY SITEWIDE until tomorrow
13:19 RobH: updated wikibugs on mchenry about 10 minutes ago and forgot to admin log
13:17 mark: Created new 64 bit aggregate and root volume on nas1001-a, copied over root data, destroyed old root volume, moving new aggregate to first disks
13:00 RobH: disabled line numbers in vimrc.local on puppet because i hate line numbers (it already says the line number yer on in the corner)
12:21 mark: Started disk replace operation of 0a.10.0 by 0a.10.3 and 0a.10.1 by 0a.10.4 on nas1-b
12:12 mark: Started disk replace operation of 0b.00.0 by 0a.01.1, and 0b.00.1 by 0a.01.2 on nas1-a
12:21 mark: Rebooting both nas1001 controllers with corrected network config
10:02 logmsgbot: root synchronized wmf-config/mc.php 'swapped out all down servers returning 0 with ones returning 100, cleaned up spares and down list to be accurate'
09:44 RobH: swapping out down memcached servers iwth tested up ones
09:44 RobH: incorrect, turns out there are some down, they reply but do not return full weight
09:44 logmsgbot: robh synchronized wmf-config/mc.php 'replacing down servers'
09:40 RobH: nagios shows spence memcached error, yet manual testing shows all memcached are pooled fine.
09:01 Ryan_Lane: installing ssl1001-4
08:09 apergos: live hack to /usr/share/etherpad/etherpad/bin/run-local.sh on hooper to force maxthreads to 512 for etherpad. restarted etherpad, this may or may not help connection issues (maybe we'll just oom instead?)
11:04 logmsgbot: laner synchronized wmf-config/InitialiseSettings.php 'Adding sitename for ladwiki, per bug 30181'
10:16 mark: Restarting all PyBal instances in a controlled manner
10:13 mark: Split LVS class high-traffic into high-traffic1 (text, bits) and high-traffic2 (upload), and made a separate class for https with BGP disabled
09:29 Ryan_Lane: powercycling srv281
08:14 mark: Increased TTL of bits.pmtpa.wikimedia.org from 60 to 1H
08:09 mark: Changed IP address of upload.pmtpa.wikimedia.org from .3 to 208.80.152.211
08:03 mark: Increased etherpad memory from 1G to 2G in /usr/local/etherpad/bin/run.sh
07:39 logmsgbot: tstarling synchronized wmf-config/CommonSettings.php 'in parser cache multiwrite configuration, putting memcached first instead of mysql'
05:51 mark: Restarted nginx on yvon and gurvin; apparently nginx needs to be restarted after ip addresses changes, as it binds each individual address instead of IN_ADDR_ANY
04:23 notpeter: stopping puppet on spence temporarily
02:21 logmsgbot: LocalisationUpdate completed at Tue Aug 2 02:23:56 UTC 2011
00:26 AaronSchulz: deployed r93695 to fix WURFL infinite loop
13:40 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Bug 30003 - Please set wgUploadNavigationUrl + wgUploadMissingFileUrl for nl.wikisource.org nl.wikiquote.org nl.wiktionary.org'
13:34 mark: Upgraded cr2-pmtpa to JUNOS 10.4R6.5
13:24 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'bug 30045 define transwiki import source for kmwiki from be, en and simple'
13:16 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'bug 30065 replace the Hebrew Wikipedia logo'
13:04 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'bug 30085 Name update for cy.wikibooks.org'
02:16 logmsgbot: LocalisationUpdate completed at Fri Jul 29 02:18:38 UTC 2011
00:10 AaronSchulz: Updated wikimedia-periodic-update (for FR) to use MW script wrapper
00:05 AaronSchulz: Updated mw-central-notice, mw-tor-list to use MW script wrapper
July 28
23:36 AaronSchulz: Updated update-special-pages, update-special-pages-small, jobs-loop.sh to use MWScript.php
23:06 binasher: power cycled srv 173
22:56 logmsgbot: reedy synchronized wmf-config/db.php 'srv173 is dead to the world commenting out'
22:52 Reedy: srv173 seems deaded
22:40 logmsgbot: hashar: made proofreadpage.php configuration publicly available through http://noc.wikimedia.org/conf/ . Usual notice is at the top of the file. Requested by phe (frwiki).
18:50 logmsgbot: LocalisationUpdate completed at Thu Jul 28 18:52:41 UTC 2011
18:46 notpeter: pushing out new dns zone file (only changing cname survey to point to argon.w.o)
18:40 Reedy: LocalisationUpdate seems to be working correctly now
18:35 Reedy: Fixed localisationupdate for FR, added symlink in /home/wikipedia/l10n/trunk/extensions/FlaggedRevs for the language folder to presentation/language
18:32 notpeter: removing survey.w.o from sites enabled on singer for limesurvey migration, and doing a graceful on singer
18:22 logmsgbot: reedy synchronized wmf-config/CommonSettings.php 'Remove old ArticleAssessment stuff'
18:21 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Remove old ArticleAssessment stuff'
13:40 Reedy: Do not push any configuration change from wmf-config without speaking to Reedy/Ariel first
13:33 Reedy: Seems some configuration files between r2087 and r2088 weren't pushed to site, pushing them broke most projects. Aaron to investigate when he's in the office
12:36 logmsgbot: reedy synchronized wmf-config/CommonSettings.php 'Reverting last push'
12:36 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Reverting last push'
12:32 logmsgbot: reedy synchronized wmf-config/InitialiseSettings.php 'Reverting last push'
12:31 logmsgbot: reedy synchronized wmf-config/CommonSettings.php 'Reverting last push'
12:27 andrewS: miguel gave OK to fill up d1 in sdtpa as the heat flow issues have been temporarily resolved (also said construction on new AC units was to begin Aug 1st but has not heard anything yet).
11:15 Jeff_Green: adjusted firewall rules on payments servers per RT #1223
10:28 apergos: chowned /home/w/logs/*.log to udp (was root, /usr/local/bin/demux.py on nfs1 was failing to write to these files since July 21). This is probably the wrong solution but I don't know what the right setup is; can someone who does, make it right?
08:55 andrewS: 74th r410 dell server has arrived in tampa. uploaded packing slip to rt#611.
06:13 Jeff_Green: repooling payments3 & payments4, they're dist-updated and tested
05:05 logmsgbot: aaron synchronized php/maintenance/jobs-loop.sh 'added --wiki to another spot'
18:40 AaronSchulz: CommonSettings.php is now at CommonSettings-new.php for now, the former is just a wrapper (making it easy to go back to the old one)
09:36 logmsgbot: aaron synchronized php-1.17/extensions/CategoryTree/CategoryTree.php 'live hack to efCategoryTreeAjaxWrapper to shut up $option notices'
02:57 logmsgbot: catrope synchronized php/extensions/CategoryTree/CategoryTree.php 'remove debugging code; no point as long as the log collector is broken'
02:56 logmsgbot: catrope synchronized php/extensions/CategoryTree/CategoryTree.php 'debugging a PHP warning'
02:54 logmsgbot: catrope synchronized php/extensions/CategoryTree/CategoryTree.php 'debugging a PHP warning'
20:28 logmsgbot: catrope synchronized wmf-config/InitialiseSettings.php 'Raise AFT percentage to 77%'
20:27 logmsgbot: catrope synchronized wmf-config/CommonSettings.php 'Lower AFT clicktracking to 0.35%. Also includes lots of whitespace changes and an unused global removal by Aaron'
19:52 RobH: ms1004 rt1164 passes full dell hardware testing suite
19:08 notpeter: upgrading squid and squid frontend on amssq46
18:56 logmsgbot: jeluf synchronized wmf-config/InitialiseSettings.php '29971 - Set wgUploadMissingFileUrl for nlwiki'
17:26 Ryan_Lane: powering off srv173, since it is dead, and is messing up the stats
16:33 logmsgbot: aaron synchronized wmf-config/CommonSettings.php 'unused var in wfNoDeleteMainPage'
16:18 mark: Deployed lvs3 as backup LVS balancer for 'low-traffic' class LVS services, backing up lvs4
15:22 RobH: lvs3 reinstalled, puppet not run
15:07 RobH: updated dns again for more changes pertaining to lvs
15:02 RobH: updated dns to add lvs3.wikimedia.org to forward dns, was already in reverse file.
14:56 RobH: lvs6 reinstalled, puppet not synced and not signed as it may need further work
14:50 mark: Added new service IPs for bits and upload to varnish and squid classes in puppet
14:43 mark: Allocated out of subnet IPs for bits and upload.pmtpa
14:35 RobH: lvs3, lvs5, & lvs6 being (re)installed and setup, please disregard any warnings.
14:34 mark: Redeployed RunCommand monitor on lvs4, now fixed
14:33 RobH: lvs3 being reinstalled
July 20
22:51 Ryan_Lane: started memcached on srv260
21:53 Ryan_Lane: added monitoring for esams and pmtpa -lb addresses
21:10 Jeff_Green: db10 mysqldump is done, replication restarted
21:05 Ryan_Lane: adding lvs IPs for all esams -lb addresses to amslvs servers
18:59 logmsgbot: awjrichards synchronized wmf-config/CommonSettings.php 'Variable-izing enabling of ContactPageFundraiser extension, enabling on testwiki and foundation wiki'
18:56 logmsgbot: awjrichards synchronized wmf-config/InitialiseSettings.php 'Variable-izing enabling of ContactPageFundraiser extension'
17:40 RobH: lvs3 racked, but powered off. mgmt confirmed working, network port confirmed up
16:09 mark: Disabling RunCommand monitor on lvs4; not working right
15:50 mark: Restarted PyBal on lvs4
15:27 RobH: powercycling srv283
15:27 RobH: srv283 is hung up again, and now wont take my commands in serial console...
15:22 RobH: srv283 loads are insane, its very slugging, and only serial console works. kicking the apache process over to free up memory and resume normal function
20:58 logmsgbot: catrope synchronized wmf-config/CommonSettings.php 'Set up Category:Article Feedback Blacklist as an AFT blacklist category on enwiki'
20:45 mark: Flipped all backup static routes for LVS service ips from lvs3 to lvs4
17:12 aaron copied wmf-config up one directory (to common/)
16:08 RobH: ms1004 started hardware testing. system is using the build in diag tests for dell, but do NOT work via serial redirection. Thus ms1004 appears offline, but is running tests. DO NOT REBOOT IT
14:51 logmsgbot: reedy synchronized php-1.17/wmf-config/InitialiseSettings.php 'bug 29942 Enable UploadNavigationUrl for Hindi Wikipedia'
14:49 logmsgbot: reedy synchronized php-1.17/wmf-config/InitialiseSettings.php 'bug 29942 Enable UploadNavigationUrl for Hindi Wikipedia'
12:49 logmsgbot: ariel synchronized php-1.17/wmf-config/abusefilter.php 'disable abusefilter-private for es wikt, it would reveal ips, see bug 29910'
07:34 logmsgbot: jeluf synchronized php-1.17/wmf-config/InitialiseSettings.php '29648 - Set $wgCollectionHierarchyDelimiter = "/" for Wikibooks projects'
07:33 logmsgbot: jeluf synchronized php-1.17/wmf-config/CommonSettings.php '29648 - Set $wgCollectionHierarchyDelimiter = "/" for Wikibooks projects'
07:18 logmsgbot: jeluf synchronized php-1.17/wmf-config/InitialiseSettings.php '28221 - Romanian Wiktionary $wgSitename change and change of namespace name for extra namespace'
17:30 logmsgbot: catrope synchronized php-1.17/wmf-config/InitialiseSettings.php 'Disable Narayam on tawiki per bug 29798'
11:36 logmsgbot: reedy synchronized php-1.17/wmf-config/CommonSettings.php 'remove some old commented code'
04:56 Jeff_Green: rebooting db1008 after dist-upgrade
01:12 Ryan_Lane: restarting pybal on lvs2
00:42 Ryan_Lane: adding nginx configuration for all domains to the https clusters
00:41 Ryan_Lane: adding all -lb addresses to the https cluster for LVS. also adding the text.esams.wikimedia.org address to the esams https cluster, since that address will switch to a -lb address in the future.
00:33 Jeff_Green: installed packages on db1008 to support compiling mydumper: python-sphinx libglib2.0-dev libmysqlclient15-dev zlib1g-dev libpcre3-dev gcc make cmake g++
00:21 Ryan_Lane: adding wiktionary-lb, wikiquote-lb, wikibooks-lb, wikisource-lb, wikinews-lb, wikiversity-lb, mediawiki-lb, and foundation-lb addresses to geodns, and forward and reverse for esams and pmtpa
18:13 logmsgbot: laner synchronized wmf-deployment/cache/interwiki-pr.cdb 'Re-pushing the protocol relative interwiki cdb, since it had bad write permissions'
18:12 logmsgbot: laner synchronized php-1.17/wmf-config/InitialiseSettings.php 'Adding testwiki back into the https experiment'
18:07 logmsgbot: laner synchronized php-1.17/wmf-config/InitialiseSettings.php 'Taking testwiki out of https experiment'
18:04 logmsgbot: laner synchronized php-1.17/wmf-config/InitialiseSettings.php 'Changing stdlogopr to prstdlogo'
18:03 logmsgbot: laner synchronized php-1.17/wmf-config/CommonSettings.php 'Changing stdlogopr to prstdlogo'
17:59 logmsgbot: laner synchronized php-1.17/wmf-config/CommonSettings.php 'Adding stdlogopr, for protocol relative url stdlogo for https experiment'
17:57 logmsgbot: laner synchronized php-1.17/wmf-config/InitialiseSettings.php 'Putting testwiki into the https experiment'
17:33 RobH: replaced disk in db1006, fixed
17:07 logmsgbot: jeluf synchronized php-1.17/wmf-config/InitialiseSettings.php '29738 - add auto-patrolled group to he.wikiquote'
16:13 mark: Disabled STP on asw-d3-sdtpa
16:02 mark: Halted nas1001-a and nas1001-b for recabling
15:54 logmsgbot: jeluf synchronized php-1.17/wmf-config/InitialiseSettings.php '29798 - Install Narayam on Tamil wiki projects'
15:40 mark: Reran initial setup on nas1001-b, rebooting it now
15:30 mark: Reran initial setup on nas1001-a, rebooting it now
15:23 mark: Setup switch ports and LACP LAGs for nas1001
14:16 awjr: disabling puppet on grosley (including its cronjobs)
11:54 mark: Increased MTU on both pmtpa-eqiad links to 9192
11:32 mark: Renamed core-database puppet class into db::core
01:47 RoanKattouw: nfs1 overloaded because I tried to scp too much data to fenari at once. nfs1 and fenari have high wait CPU, /home kinda useless right now. This should sort itself out over time, right?
July 13
23:50 logmsgbot: laner synchronized php-1.17/wmf-config/CommonSettings.php 'Fixing the x-forwarded-proto check'
23:47 logmsgbot: laner synchronized php-1.17/wmf-config/CommonSettings.php 'Enabling wgSecureCookie for wikis in the https experiment'
23:13 RobH: more dns updates for same servers, forgot to set the mgmt to the new nice names instead of just the asset tag names
23:00 RobH: updating dns for new servers argon, chlorine, germanium, sulfur
15:17 RobH: pushed dns change to fix dns error for db12 mgmt
13:26 mark: Rebooting asw-b-sdtpa
13:09 mark: Temporarily disabled BGP aggregates announcement on cr1/cr2-eqiad
00:32 logmsgbot: laner synchronized php-1.17/wmf-config/CommonSettings.php 'Setting the powered-by link to a protocol relative url for wikis in https experiment'
00:08 Ryan_Lane: updated dumpInterwiki.php to add a protocolrelative option, which will output the interwiki cdb with all wikimedia interwiki urls set as protocol relative
19:03 Ryan_Lane: switching meta and commons from using text.wikimedia.org to use wikimedia-lb.wikimedia.org
18:56 Ryan_Lane: updated CommonSettings.php and InitialiseSettings.php, if you push changes right now, you're crazy ;)
15:42 logmsgbot: ariel synchronized php-1.17/includes/Import.php 'er, now with the svn updated file :-P'
15:41 logmsgbot: ariel synchronized php-1.17/includes/Import.php 'add LIBXML_PARSEHUGE to override 10b text node limit in recent versions of libxml'
15:41 logmsgbot: ariel synchronized php-1.17/maintenance/backupPrefetch.inc 'add LIBXML_PARSEHUGE to override 10b text node limit in recent versions of libxml'
09:42 JeLuF: created four new wikis, bugs 29456, 29715, 29758, 29796
20:14 logmsgbot: laner synchronized php-1.17/wmf-config/InitialiseSettings.php 'Using FQDN for lock, for udp clicktracking logging instead of the short name'
19:40 Jeff_Green: purged master logs before 20110611 on db9
19:18 notpeter: pushing out new zone file (only one minor change to cname)
19:00 logmsgbot: catrope synchronized php-1.17/wmf-config/CommonSettings.php 'CommonSettings code for wmfClickTrackingLog'
23:40 logmsgbot: reedy synchronized php-1.17/wmf-config/InitialiseSettings.php 'Revert bug 28686 per request'
23:32 logmsgbot: reedy synchronized php-1.17/wmf-config/abusefilter.php 'bug 29483 add some extra abusefilter lines'
23:23 logmsgbot: demon synchronized php-1.17/wmf-config/InitialiseSettings.php 'Actually sync my brwikimedia changes, turns out I pushed Reedy's changes for bug 29627 (dewiki). Yay conflict resolution'
23:18 Reedy: bug 29267 Namespace aliases for dewiki pushed in Chads last push
23:15 logmsgbot: demon synchronized php-1.17/wmf-config/InitialiseSettings.php 'bug 28686: new namespaces for brwikimedia'
23:03 Reedy: run namespacedupes on mkwikisource
23:02 Reedy: run namespacedupes on rowikisource
22:54 logmsgbot: reedy synchronized php-1.17/wmf-config/InitialiseSettings.php 'bug 29190 NSs for rowikisource'
17:47 mark: Migrated routing to management gateway from csw1-sdtpa to cr1-sdtpa:ae0.402
17:34 mark: Moved transport link between csw1-sdtpa and cr1-eqiad to cr1-sdtpa:xe-0/0/1
17:06 RobH: All ops should be watching eqiad temps and humidity closely over the next few hours/days as the airflow is adjusted to lower relative humidity to acceptable ranges
16:44 mark: Migration completed
16:40 mark: Migrated routing of vlan 100 (squid+lvs) from csw1-sdtpa to cr1-sdtpa
16:28 mark: Migrated routing of vlan 2 (private) from csw1-sdtpa to cr1-sdtpa
16:16 mark: Migrated routing of vlan 101 (public services) from csw1-sdtpa to cr1-sdtpa
16:13 RobH: srv278 shutdown
16:10 mark: Migrated routing of vlan 102 (sandbox) from csw1-sdtpa to cr1-sdtpa
15:51 mark: Migrated routing of vlan 103 (virt) and vlan 104 (pub-svc2) from csw1-sdtpa to cr1-sdtpa
15:30 mark: Migrated routing of vlan 105 (virt-hosts) from csw1-sdtpa to cr1-sdtpa
15:29 mark: Moved transport link between csw1-sdtpa to cr2-eqiad, to cr1-sdtpa port xe-1/1/0
15:01 RobH: db1023 drac fixed, had typo in ip allocation, its ready for os install, rt#1105
14:42 RobH: confirmed that high humidity readings in eqiad are valid, reopened ticket with eq support
14:35 mark: Powercycled asw-a5-sdtpa
07:25 Tim: removed the part of scap that copied skins to /mnt/upload6. This is no longer required, I added a redirect to bits instead.
03:56 Tim: in puppet, removed the subscribe from the rsyncd service for /home, since the init script is buggy and causes the service to go down (bind() failed: Address already in use). Rsyncd doesn't need it anyway, it re-reads the configuration file at each request.
03:34 Tim: on nfs1: changed rsyncd.conf to allow client IPs from a subnet that includes mobile1, since it looks like patrick tried to set up mediawiki on it. Ran sync-common on mobile1.
03:00 Tim: deleting and recreating /usr/local/apache/common-local on all servers, to get mwdeploy ownership
02:39 Tim: updated sudoers.appserver and /h/w/b/sync-common-file to support the sync script changes
02:33 Tim: updated wikimedia-task-appserver manually on searchidx1, searchidx2, fenari, hume. Manually removed the php.ini-dist diversion on searchidx1, see CR r76095
02:00 Tim: built wikimedia-task-appserver 2.2 for lucid, uploading to brewster, since we have some lucid boxes in production
01:35 Tim: uploading wikimedia-task-appserver 1.48 to brewster. Waiting for puppet to immediately wreak havoc in response.
01:25 Tim: re-enabled apache on srv255. Note that sync-file pushes to it will fail until I finish updating everything
01:17 Tim: did a test install of wikimedia-task-appserver version 1.48 on srv255
15:45 mark: Established full iBGP mesh between cr1-sdtpa, csw5-pmtpa, cr1-eqiad, cr2-eqiad (NOT csw1-sdtpa)
00:25 awjr: disabling fundraising-related squid log filtering for udp2log
July 6
23:45 Ryan_Lane: powercycling srv281
23:44 Ryan_Lane: powercycling srv266
23:44 Ryan_Lane: powercycling srv217
23:43 Ryan_Lane: powercycling srv206
23:42 Ryan_Lane: powercycling srv154
23:40 Ryan_Lane: powercycling srv276, it's dead
22:56 mark: Setup cr1-sdtpa with initial config; connected to csw5-pmtpa (via L2 csw1-sdtpa); OSPF up
22:36 RobH: wmf_ops: you can disregard the humidity alarms for eqiad that are spamming alerts to email. eq confirms no humidity issue on site and I will investigate the actual sensors this friday
22:21 Ryan_Lane: repooling srv169, it was missing the wikimedia-lvs-realserver package. fixed in puppet
21:30 logmsgbot: pdhanda ran sync-common-all 'Synced to r91606 for ArticleFeedback'
21:27 Ryan_Lane: setting proxy setting for secure back to original setting. removed ~ files from sites-enabled
21:17 Ryan_Lane: putting retry=3 back, and adding a timeout of 15 seconds to secure
21:16 Ryan_Lane: removed retry=3 from ProxyPass directive for secure. 3 seconds really isn't enough for this service...
21:06 RobH: running puppet on spence, this is going to take forever.
21:05 Ryan_Lane: restarting apache on singer
19:37 mark: Added DNS entries for cr1-sdtpa and cr2-pmtpa
19:25 logmsgbot: hashar: hexmode raised an user issue with blocking. It is a lock wait timeout happening from time to time on enwiki. 30 occurences in dberror.log for Block::purgeExpired. Could not reproduce it so I am assuming it was temporary issue.
19:15 logmsgbot: hashar: srv154 seems unreachable. dberror.log is spammed with "Error connecting to <srv154 IP>"
19:13 RobH: added webmaster@ to other top level domain mail routing to forward to the wikimedia.org webmaster for google securebrowsing stuff per RT#1122
18:08 pdhanda: running maintenance/cleanupTitles.php on commonswiki
17:51 pdhanda: Running maintenances/namespaceDupesWT.php on commonswiki
17:12 RobH: srv169 successfully back in service, tests fine and has all updated files, lvs3 updated to include it in pool
17:11 RobH: returning srv169 into service
15:37 mark: Removed ms5:/etc/cron.d/mdadm
15:37 mark: Stopped MD raid resync on ms5
15:28 RobH: search18 booted back up successfully
15:25 RobH: api lag issues known due to search server failure, being worked presently
15:24 RobH: search18 sas configuration bios confirms both disks are still in a non-degraded (according to it) mirror
15:23 RobH: search18 randomly rebooted after checking disks before the login prompt
15:19 RobH: rebooting search18
15:14 RobH: search18 appears to be completely offline, investigating lom logs before rebooting.
15:12 RobH: search18 offline, logging into mgmt to check it out
15:01 RobH: eqiad humidity levels ticket dispatched for fufillment
14:37 mark: Paused rsyncs on ms5
14:04 mark: Powercycled sq36
13:18 ^demon|away: fixed permissions on svn c/o on ci.tesla, ran svn cleanup. cruise control still not pleased and yelling about locks
13:16 mark: Upgrading firmware of scs-a1-sdtpa
12:51 mark: csw5-pmtpa crashed and reloaded
11:53 mark: Upgrading firmware of scs-c1-pmtpa
July 5
23:53 ^demon: scratch that....ssh just seems to have been rather slow in getting its act together. ci.tesla is just fine now
23:50 ^demon: well now I've locked myself out of ci.tesla. Seems it doesn't start ssh on reboot...what a silly thing to do
23:47 ^demon: rebooting ci.tesla since it was horribly hung up on the latest build--was it really stuck for the past 24hrs?
12:52 mark: Started rsync of upload images on ms7 to ms1002 (eqiad) in a screen on ms1002
12:35 mark: OS-installed ms1002
08:43 morebots died because of netsplit
04:58 Tim: setting up lighttpd mod_auth on lily
July 1
20:18 logmsgbot: hashar: ci.tesla: installed exiv2 (required for some Api tests)
20:13 andrewS: mw1-mw22 racked and drac setup
19:49 RobH: typo in my number sequence update for the new 74 mw servers, updating dns to fix
19:09 Ryan_Lane: Changing text backend for https in esams to textsvc IP instead of text.esams.wikimedia.org IP
18:49 Ryan_Lane: restarting pybal on amslvs1 and amslvs3
18:34 logmsgbot: midom synchronized php-1.17/wmf-config/InitialiseSettings.php 'removing mysql from pcache, need to repartition, current scheme too costly on memory'
18:25 Ryan_Lane: adding textsvc.esams IP to amssq and knsq, and amslvs servers
18:23 logmsgbot: hashar: ci.tesla install wbritish-small package to get a word list in /usr/share/dict
18:08 Ryan_Lane: restarting pybal on lvs2 to add textsvc service
17:29 logmsgbot: midom synchronized php-1.17/wmf-config/InitialiseSettings.php 'adding enwiki to the mix'
17:27 logmsgbot: midom synchronized php-1.17/wmf-config/InitialiseSettings.php 'sending a bit lower cache load to db40, though I guess that was initial shock before'
17:25 RobH: Aware that #wikipedia is down and fixed, in less than a minute, woot
17:25 logmsgbot: midom synchronized php-1.17/wmf-config/InitialiseSettings.php 'oops, should have started with lower load, opentable cache..'
17:23 logmsgbot: midom synchronized php-1.17/wmf-config/InitialiseSettings.php 'sending cache load again to db40'
17:16 Ryan_Lane: adding text.svc address to pmtpa text squids and lvs2
17:08 Ryan_Lane: adding text.svc private address to esams and pmtpa DNS
14:56 RobH: pdns crash after rehup on ns1.wikimedia.org, restarting pdns
14:49 RobH: dns updated to remove the mgmt ip addresses for srv302+ and replace them with mw1-mw74
14:33 RobH: updating dns, removing entries for non-existent srv302+ and replacing them with mw1-mw74
11:18 andrewS: motion sensors are being installed today in sdtpa that are going to trip the lights when there is movement. not sure how this will affect cameras. will update once their work is done later today.
07:33 andrewS: the 3 new racks in pmtpa have power plugs running to them now.
05:51 apergos: cron job set up on snapshot1 to dump central auth tables (in subdir by date just like all the private wikis) on weekly basis, runs as backup user
00:01 RobH: disabled the notify requesters on resolve in the hw-decommission queue in RT