17:11 RoanKattouw: Syncing dblist to make switchover work
17:09 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Switch all wikis in switchover-jun30.dblist to Vector, UsabilityInitiative, new thumb size and new logo'
17:09 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'Switch all wikis in switchover-jun30.dblist to Vector, UsabilityInitiative, new thumb size and new logo'
21:41 RoanKattouw: Running scap to deploy UsabilityInitiative and Vector fixes, for real this time
21:27 RoanKattouw: Strike that, putting the code on test first
21:26 RoanKattouw: Running scap to deploy UsabilityInitiative and Vector fixes
20:46 Fred: removed A record for volunteer.w.o since it was pointed out that it linked to a winamp playlist.
19:14 Fred: removed Naoko from lists to prevent bounces :(
18:46 Fred: removed cron entry from isidore for bz-reporter -> installed mail on Kaulen which will now handle the weekly reports.
17:54 Rob: connected mr1-pmtpa port 0/1 to csw5 port 7/1
17:48 Rob: and msw1-pmtpa:prt24 to msw-b2 (port unimportant)
17:47 Rob: new mgmt connections: mgmt-gateway: serial to scs-ca-pmtpa:port1; mgmt-gateway:0/0 to msw1-pmta:port23, msw1-pmtpa:mgmt to scs-c1-pmtpa port2
16:23 Rob: dataset1 bad disk swapped and array shows rebuilding
15:37 Rob: srv89 resurrected and playing nicely with the other server kids
15:15 Rob: srv146 updated (puppet,sync-docroot,sync-apache), working, and back in lvs cluster
15:09 Rob: fixing srv146, removed from lvs until fixed
13:17 mark: (and domas) Restarted DNS on ns0 and ns2
10:21 mark: Depreffed _1299_16265_43821_ route
June 21
18:56 tomaszf: sync'ing public dir from storage2 to dataset1
18:29 tomaszf: importing prod civi database onto dev
12:36 domas: so easy to earn a beer!
12:32 domas: powercycled pdf1, OOM
June 20
16:27 JeLuF: lighttpd issue on lily, it only listens for port 443 on IPv4, not on IPv6, server.use-ipv6 = "enable" is set, it listens to port 80 on both v4 and v6.
June 19
20:42 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '24042 - Namespace Aliases for English Wikibooks'
19:10 logmsgbot: catrope synchronized php-1.5/includes/StringUtils.php 'Revert live hack'
18:57 logmsgbot: catrope synchronized php-1.5/includes/StringUtils.php 'Live hack for debugging OOM in '
18:53 JeLuF: fixed broken wikimedia-task-appserver package on fenari. PHP maintenance scripts were no longer running.
16:40 Rob: updated dns with analytics and donation ip assignments on the tesla dev box
04:44 logmsgbot: andrew synchronized php-1.5/wmf-config/CommonSettings.php 'Bump style version'
03:18 logmsgbot: andrew synchronized php-1.5/extensions/StrategyWiki/ActiveStrategy/ActiveStrategy_body.php 'Merge r68206'
03:13 logmsgbot: andrew synchronized php-1.5/extensions/StrategyWiki/ActiveStrategy/ActiveStrategy_body.php 'Merge r68204'
03:05 logmsgbot: andrew synchronized php-1.5/extensions/StrategyWiki/ActiveStrategy/ActiveStrategy_body.php 'Fix column ambiguity not found on my local install'
03:03 logmsgbot: andrew synchronized php-1.5/extensions/StrategyWiki/ActiveStrategy/ActiveStrategy_body.php 'Merge r68202'
02:48 logmsgbot: andrew synchronized php-1.5/extensions/StrategyWiki/ActiveStrategy/ActiveStrategy_body.php 'Merge r68200'
02:34 logmsgbot: andrew synchronized php-1.5/extensions/StrategyWiki/ActiveStrategy/ActiveStrategy_body.php 'Merge r68198'
02:21 logmsgbot: andrew synchronized php-1.5/extensions/StrategyWiki/ActiveStrategy/ActiveStrategy_body.php 'Deploy r68196'
02:20 logmsgbot: andrew synchronized php-1.5/extensions/StrategyWiki/ActiveStrategy/ActiveStrategy.php 'Deploy r68196'
June 17
21:22 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Bump style version appendix'
05:30 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '23988 - Switch the right that crats are able to give and take away to Transwiki Importer instead of Importer'
23:47 Tim: on enwiki master, was seeing 14 threads queued for up to 50 seconds waiting to do Title::invalidateCache() in response to FlaggedArticleView::addToDiffView(). Disabled that caller, fixed now
17:51 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 23978: Enable subpages in Outreach namespace on officewiki'
14:57 logmsgbot: aaron synchronized php-1.5/wmf-config/flaggedrevs.php 'Fixed $wgFlaggedRestrictions on hewikisource. Changed editor -> review; there is no editor right. Probably broken for ages.'
14:41 logmsgbot: aaron synchronized php-1.5/wmf-config/flaggedrevs.php 'Made autoreview right work without review right present.'
20:09 JeLuF: srv217: dpkg reported broken gmond package, refused further updates. Fixed. Run puppetd --test.
17:35 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '22616 - Create aliases for Romanian namespaces'
16:47 logmsgbot: jeluf synchronized php-1.5/wmf-config/flaggedrevs.php '23948 - Remove autoreview bit from editor usergroup on enwikinews'
16:44 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '23953 - Add rollback permission to the patrollers group in the French Wikisource'
13:39 logmsgbot: andrew synchronized php-1.5/extensions/StrategyWiki/ActiveStrategy/ActiveStrategy_body.php 'Merge r67943'
13:32 logmsgbot: andrew synchronized php-1.5/extensions/StrategyWiki/ActiveStrategy/ActiveStrategy_body.php 'Merge r67941'
13:32 logmsgbot: andrew synchronized php-1.5/extensions/StrategyWiki/ActiveStrategy/ActiveStrategy.php 'Merge r67941'
12:15 RoanKattouw: Had JeLuF change l10nupdate to run as catrope rather than brion, should shut up the SSL error and make it easier for me to fix in the future
08:03 logmsgbot: aaron synchronized php-1.5/wmf-config/flaggedrevs.php 'add reviewer group back to flaggedrevs_labswikimedia'
02:01 logmsgbot: LocalisationUpdate failed
June 12
21:33 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '23935 - Autopatroller group for Swedish Wikisource (svwikisource)'
21:28 JeLuF: #23928 - change nyc.wikimedia.org redirect target in redirects.conf
07:23 mark: Dumping Squid caches for top-9 wikipedias
07:23 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Vector rollout, logo change, thumbnail size change on top 10 wikis sans enwiki'
07:23 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'Vector rollout, logo change, thumbnail size change on top 10 wikis sans enwiki'
23:27 logmsgbot: tfinc synchronized php-1.5/wmf-config/CommonSettings.php 'Adding global support page to tracking'
23:05 logmsgbot: tfinc synchronized php-1.5/extensions/ContributionReporting/ContributionTrackingStatistics_body.php 'adding range support to Special:ContributionTrackingStatistics'
22:36 logmsgbot: tfinc synchronized php-1.5/wmf-config/InitialiseSettings.php 'turning off fr redirect till fix is in place'
22:08 logmsgbot: fvassard synchronized php-1.5/wmf-config/InitialiseSettings.php 'enabling mobile redirect on FRwiki'
21:34 Fred: mobile LVS setup. Traffic to mobile now balanced between mobile1 (30%) and mobile2 (70%)
19:58 Rob: updated civicrm certificate file, had to apache2ctl restart on grosley
19:03 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Add .omniplan to extension whitelist for private wikis. Requested by Pete'
18:56 Fred: new wikimedia-task-dns-auth deployed to nameservers. langlist regenerated. *lang*.m.wikipedia.org is now cnamed to m.wikimedia.org
16:41 Rob: pushed updates to wordpress plugins on singer
16:40 Fred: I foobared. need to redirect traffic to mobile2 so mobile1 can be re-imaged.
15:20 Rob: updated singer and dns records for austin's setup of wm10reg
01:29 Tim: running throughput test on mobile2
June 7
23:12 Fred: modified puppet receipe for mobile to allow for easier passenger gem update.
22:04 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Bump style version appendix'
19:12 logmsgbot: tfinc synchronized php-1.5/extensions/CentralNotice/SpecialNoticeTemplate.php 'Picking up pagination from r67553'
18:56 logmsgbot: tfinc synchronized php-1.5/extensions/CentralNotice/SpecialNoticeText.php 'Picking up pagination from r67553'
16:57 Fred: Limiting message size to 5M for lists.
16:43 Tim: moved mediawiki-cvs archive files away to prevent further archiving
16:26 Tim: disabled archiving on problematic list mediawiki-cvs
14:07 logmsgbot: andrew synchronized php-1.5/extensions/StrategyWiki/ActiveStrategy/ActiveStrategy_body.php
14:04 logmsgbot: andrew synchronized php-1.5/extensions/StrategyWiki/ActiveStrategy/ActiveStrategy_body.php
13:49 logmsgbot: andrew synchronized php-1.5/extensions/StrategyWiki/ActiveStrategy/ActiveStrategy_body.php
13:48 logmsgbot: andrew synchronized php-1.5/extensions/StrategyWiki/ActiveStrategy/ActiveStrategy_body.php
13:46 logmsgbot: andrew synchronized php-1.5/extensions/StrategyWiki/ActiveStrategy/ActiveStrategy_body.php
07:43 Tim: on lily: fixed broken sources.list and upgraded python, installed python symbols
07:25 Tim: on lily: restarted mailman (including ArchRunner)
07:07 Tim: on lily: there are three instances of ArchRunner which are in a tight loop using CPU but not doing any syscalls, presumably dead. Renicing them, will try to make backtraces
03:41 Tim: srv281 back up and resynced. However it crashed after being up only 3 days and shows more machine check errors. Suggest RMA.
03:25 Fred: setup ganglia for lily.
03:07 Tim: srv281 down for 41 hours, trying reboot
02:54 Fred: upgraded gmond on lily. Modified gmond.conf to handle new modular arch.
02:46 Tim: on lily: experimentally disabling broken bayes feature in spamassassin
02:35 Tim: on lily: restarting spamd and fixing lock file errors
01:25 Tim: on mobile1: Reduced PassengerMaxRequests to 5000 to reduce memory leakage
07:33 Tim: on mobile1: use PassengerMaxRequests instead of PassengerPoolIdleTime, to avoid oscillation between 200 and 30 processes, once every half an hour due to unknown slowdown
03:10 Tim: serve a 404 error for requests to the mobile server for domains other than *.m.wikipedia.org. DNS points here for lots of domains.
02:14 Tim: installed a redirect from en.m.wikipedia.com to .org
01:18 Tim: changed the mobile1 apache access log format to something more useful (and squid-like)
00:47 Tim: on mobile1: installed logrotate script for apache2 (via puppet)
02:09 Tim: restarting gmond on all miscellaneous cluster servers, to make mobile2 reappear in ganglia (broken by IP renumber)
June 4
22:52 tomaszf: starting webstats with new binary
22:50 tomaszf: stopping webstats in prep for update to track mobile stats
19:30 atglenn: moved bad snapshots (apr 11 through may 6 2010) to /mnt/dumps/public/bad so public index shows only good dumps and so there will be no prefetch against them
18:47 Fred: moved mobile2 to squid vlan / re-ip'ed / dns changed. mobile1 => 115 mobile2 => 116
18:35 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Bump style version appendix. Gotta kill this thing some time'
09:50 Tim: pushing out WikimediaMobile (r67331) in preparation for deployment on testwiki
08:44 domas: decreased keepalivetimeout and timeout on mobile1
08:35 Tim: on mobile1: reduced max passenger pool size to 200, Domas and I think it's about right, shouldn't exceed allowable memory, should give us close to 100% CPU.
08:26 Tim: on mobile1: domas fixed file limit, now 50k
08:10 Tim: increasing MaxClients on mobile1 to 1500
05:01 Fred: Added apache2.conf, memcached.conf to puppet receipe for mobile.
03:43 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '23784 - Modify add/remove rights for bureaucrats on officewiki'
02:46 Tim: mobile1: increased ServerLimit to 1500 and reduced MaxClients to 500
02:35 Tim: on mobile1: increased memcached memory limit from 64M to 5000M
02:15 Tim: switched mobile1 over from apache2-mpm-worker to apache2-mpm-prefork (via puppet)
01:03 Tim: set ganglia host_dmax to 1 day
June 3
21:57 Fred: mobile1 re-imaged and puppetized. Changed subnet for mobile1. Changed DNS for mobile1. m pointing to newly imaged mobile1 (until transition is completed)
08:39 Tim: checked all serial consoles, all nonresponsive, rebooted all
08:23 Tim: sq33, sq34, sq35, sq37, sq38, sq40, sq45 have been down for 16-28 days, apparently for no good reason, can't find any log or DT entries. Will try reboots.
07:56 Tim: added new squids to nagios
06:36 Tim: cleaning cache directories on sq56 to avoid resurrection of expired content
06:35 Tim: adding monitoring for rather important service IPs: upload.esams and text.esams
06:22 Tim: sq56 not responding to ping or serial console (for 4 days), nothing in racadm getsel, rebooting
06:07 logmsgbot: tstarling synchronized php-1.5/wmf-config/InitialiseSettings.php 'disabling ClickTracking due to CR r58099'
05:24 Tim: started apache on srv216, was stopped for some reason
03:57 Fred: shutting down mailman on list for a few minutes while exim and spamd catch up
01:42 Tim: adding forward and reverse DNS for mobile.tesla.usability.wikimedia.org, 208.80.152.245
June 2
23:48 Fred: re-imaging mobile2 again ;p
21:12 logmsgbot: robh ran sync-common-all
20:57 Rob: srv134 has temp errors and multiple bad fans, out of warranty, decommissioning and removing from nagios/lvs/dsh
20:52 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '23756 - Enable "Rollbacker" group on itwiki'
20:44 Rob: how many ops does it take to bring srv281 back online? one + a vounteer to point out he did it wrong.
19:52 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '23728 - Local time for Croatian Wiktionary'
19:45 JeLuF: sync-apache doesn't update /etc/apache2/apache2.conf on all hosts, two different versions are currently in use. Probably due to missing /home NFS mount on some apaches
19:17 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '23737 - Enabling extension AbuseFilter for Upper Sorbian and Lower Sorbian Wikipedia'
19:13 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '15063 - wpSpamRegex entry for large image tables'
19:08 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '23728 - Local time for Croatian Wiktionary'
09:36 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'Enable $wgEnotifWatchlist on outreachwiki'
06:57 Tim: installed NRPE on db22
06:53 Tim: installed NRPE on db14
06:51 Tim: installed NRPE on db16
06:24 Tim: db16 has the same problem, killed 6 times today, taking the same action (27GB to 24GB)
06:16 Tim: mysql on db14 has been killed 4 times today by the OOM killer, reducing innodb_buffer_pool_size from 28GB to 24GB
06:13 Tim: mysql on db14 killed by kernel OOM killer, recovered without intervention
23:17 apergos: started third worker on snapshot3. I'm going to stop logging these now.
16:15 apergos: started second worker doing xml dumps on snapshot3 (same screen session as root)
03:57 apergos: xml dumps resumed: running one copy of "worker" on snapshot3, in a screen session as root, along with "monitor" modified in place to not meddle with lock files (so that it leaves the enwiki dump, in progress on snapshot2, undisturbed)
May 28
23:37 atglenn: taking snapshot3 out of sync updates for a few days (commented out in mediawiki-installation node file)
06:31 domas: upgrading LOM/BIOS/OS on db14,db16,db22 to 3.0/10.4
02:41 Tim: srv92 locked up for a few minutes with syslog showing SATA errors. May be near death.
01:17 logmsgbot: tstarling synchronized php-1.5/wmf-config/CommonSettings.php 're-enabled CategoryTree, unless there is some emergency, you can't just disable extensions like this'
May 27
21:35 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'Disable CollapsibleNav on outreachwiki'
20:24 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'Enable Vector and usability features on outreachwiki and foundationwiki'
20:17 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'Enable CollapsibleNav for beta users on all wikis'
20:17 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Enable SimpleSearch for beta users on all wikis'
04:13 logmsgbot: aaron synchronized php-1.5/wmf-config/InitialiseSettings.php 'vector as default skin for flaggedrevs_labswikimeida'
02:40 Tim: ran namespaceDupes.php with suffix on jawiki to fix talk namespace duplicates
May 26
23:59 Rob: updated blog to vector earlier today, didnt log... just pushed another update to the theme, blog is still happah
21:43 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'updated foundationwiki default skin from monobook to vector'
17:02 Tim: on mobile1 nginx: set proxy_next_upstream=off to avoid having the site go down regularly for 10s at a time
16:30 Tim: changing mobile1 nginx log format
06:39 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '23667 - Local time for Macedonian Wikipedia'
03:39 Tim: set up SSLCertificateChainFile on williams (otrs) per bug 23631, used the cut down version of gd_bundle.crt posted on the bug report
02:08 logmsgbot: andrew synchronized php-1.5/includes/LogEventsList.php
02:07 logmsgbot: andrew synchronized php-1.5/includes/specials/SpecialRevisiondelete.php
02:07 logmsgbot: andrew synchronized php-1.5/languages/messages/MessagesEn.php
02:05 Andrew: Deploying r66793 and r66823, first to test.wikipedia.org for final testing, then to all sites.
May 25
17:02 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Re-add style version appendix now that borked usability JS was purged from Varnish'
12:51 domas: what happened - server goes down, configuration gets updated, server comes up, builds new cache out of old conf, sync has timestamp older than cache, mw has stale config
12:47 domas: cleaned /tmp/mw-cache for some servers - apparently sync-common is not enough to get stuff in sync anymore :) we have now monobook pages in caches, I guess.
12:26 rainman-sr: freed up some space and restarted indexing on searchidx1, indexing was stuck because the disk was full
11:16 mark: Downgraded puppet on db36
06:56 domas: resynced srv206/srv193 - seems to have lost critical configuration updates while they were down. Again, why servers are not resynced after downtime?
16:18 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Bump style version appendix'
16:17 logmsgbot: catrope synchronized php-1.5/extensions/UsabilityInitiative/js/plugins.combined.min.js 'Resync, seems this didn't get updated properly last time'
09:53 RoanKattouw: Deployed new favicon and apple toch icon on test; overwrote the latter locally on srv124, will be overwritten by scap until we change the wikipedia.org-wide default
19:50 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '23596 - Add the 'Vikiprojekts' namespace and a related 'Vikiprojekta diskusija' namespace to the Latvian Wikipedia'
13:55 mark: Unshut BGP session to 13030, and reenabled announcement of our prefixes on the session
13:41 mark: Changed IP address of text.esams.wikimedia.org (i.e. the main IP address of our wiki farm at esams) from 91.198.174.2 to 91.198.174.232
13:14 logmsgbot: andrew synchronized php-1.5/extensions/StrategyWiki/ActiveTaskForces/ActiveTaskForces_body.php
13:11 logmsgbot: andrew synchronized php-1.5/extensions/StrategyWiki/ActiveTaskForces/ActiveTaskForces_body.php 'Testing'
13:09 logmsgbot: andrew synchronized php-1.5/extensions/StrategyWiki/ActiveTaskForces/ActiveTaskForces_body.php 'ActiveTaskForces: Merge r66686'
13:06 mark: Added new LVS service IPs to the older esams text squids
10:53 mark: Removed /etc/cron.hourly/mw-cleanup-tmp on all appservers
May 19
21:34 mark: Manually updated countries.nerd.dk databases with new versions on the auth dns servers
20:16 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '23568 - Allow bureaucrats to grant and remove patroller, autopatroller, rollbacker on eswiktionary'
20:11 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '23592 - Disable creating new articles for IP-users in ckbWikipedia'
20:07 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '23576 - Alias of Potal talk namespace on jawiki, jawikinews'
15:39 Rob: mchenry's drive replaced, it will start auto mirroring shortly
15:33 Rob: pulled mchenry's hot swap spare to put in a 1.5TB for migration
15:33 Rob: welcome back morebots
May 18
20:46 Rob: got distracted, but mailman is back online now.
20:43 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '23545 - Rights changes for frwiktionary'
20:23 Rob: have to pull a message from a list, stopping mailmain on lily to edit the mbox (will restart and log its restart shortly)
14:32 Andrew: 13:50 <+logmsgbot> !log andrew synchronized php-1.5/wmf-config/InitialiseSettings.php 'Activate single-revision deletion for admins on all projects'
19:42 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '23351 - Please remove the 'reviewer' user group from en wikinews if possible'
19:41 logmsgbot: jeluf synchronized php-1.5/wmf-config/flaggedrevs.php '23351 - Please remove the 'reviewer' user group from en wikinews if possible'
19:29 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '23488 - Add rename user group on Commons'
19:22 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '23545 - Rights changes for frwiktionary'
09:17 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Reenable SimpleSearch in config (still disabled in JS), bump style version appendix'
09:16 RoanKattouw: Commented out srv141 in /etc/dsh/mediawiki-installation to stop sync-file from hanging forever
09:15 logmsgbot: catrope synchronized php-1.5/extensions/UsabilityInitiative/Vector/Vector.combined.min.js 'Disable SimpleSearch in JS with if(false)'
08:41 domas: banned a sucker via badbadip
08:22 domas: site had some overload for ~15mins, two memcached/app servers didn't recover afterwards (I rebooted srv181, srv141 is still in oom or some other limbo)
21:31 mark: Setup amslvs2 and amslvs4, created fenari:/home/w/conf/pybal/esams/ for having a central location where PyBal can fetch realserver lists from
08:46 RoanKattouw: refreshCategoryCount.php running nicely on Commons now, should finish in about 2 hours
08:29 RoanKattouw: Running refreshCategoryCount.php on Commons again. Hacked script to print more progress markers, hacked Category.php to not do LOCK IN SHARE MODE
08:23 RoanKattouw: Nope, that didn't work
08:22 RoanKattouw: Attempting to run refreshCategoryCounts.php on Commons, last attempt failed with DB deadlock almost immediately
08:04 mark: Sample rate of purging increased to 1/10
08:02 mark: First stage of purging (all squid objects with known URL) done, now enabling live purging of requested enwiki pages, with 1/100 sample rate
20:08 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '23462 - please allow Crats to add/remove the Transwiki importer right on SimpleWiki'
18:38 Rob: had wrong setting in name for dns on new scs, updated dns to correct
18:18 Rob: updated dns for scs-b1-pmtpa
17:14 schmir: schmir drinks beer
17:08 Rob: project2 had two bad fans. if it fails a lot more, it may have system damage from the overheating
16:55 Rob: replicated package installation from pdf2 to pdf3
16:14 Rob: pdf3 online and setup with identical users to pdf1/2, handed off to pediapress for mwlib magics
14:59 Rob: updated dns due to linne and gilman having same IP in zone file, but linne was not listed in the reverse dns file. changing gilman IP info since its not online and in cluster just yet.
14:39 Rob: updated dns for pdf3 server
14:08 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'Enable PDF creation for anons on enwiki'
23:39 Tim-away: installed the new oggvideotools on srv224, source install from /root/src
20:00 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 21375: Allow bcrats to remove sysop, bcrat on arbcom_fiwiki'
16:51 Rob: ryan rebooted prototype vm on tesla, it was giving oom errors, software on system needs to be adjusted, as the system has 4GB dedicated to that virtual host.
00:57 logmsgbot: midom synchronized php-1.5/includes/User.php 'getAllCallers for user::invalidate'
00:33 logmsgbot: midom synchronized php-1.5/includes/Title.php 'verbose fname hook for invalidation'
00:07 logmsgbot: midom synchronized php-1.5/wmf-config/db.php 'removing srv169 from cluster9 ES config - seems to be freshly imaged server without the datastore atm, needs resync'
April 27
22:47 Tim: removed overly-broad block of all user agents containing "download" from common-acls.conf
16:36 Rob: tridge coming down for disk array addition
14:26 Rob: replaced disk 16 and 25 in dataset1
April 26
21:16 logmsgbot: ariel synchronized wmf-deployment/wmf-config/InitialiseSettings.php 'remove ip throttle exception for en pedia new user seminar, ended'
15:55 apergos: shutting down back end squid on knsq2, sdb bad
12:51 ^demon: Setup cron on mayflower for /home/demon/svn-dump-update every saturday at 1am.
07:45 mark: Removed sendmail from mobile1, so it stops spamming with cron messages. Why was it installed?
13:19 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'Touch this file to purge config cache'
11:53 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/classes/View.php 'Merge r65305'
11:52 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads_alpha/classes/View.php 'Merge r65305'
11:49 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads_alpha/classes/View.php 'Merge r65302'
11:49 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/classes/View.php 'Merge r65302'
11:28 ^demon: svn dump successfully completed. Dumped revs 1-65293 to /home/demon/svn/mediawiki-20100418-dump-with-deltas. Took up ~1.1G
11:13 logmsgbot: andrew synchronized php-1.5/wmf-config/CommonSettings.php 'Bump style vers'
11:13 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/lqt.css 'Deploy r65297'
11:11 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads_alpha/lqt.css 'Deploy r65297'
10:40 logmsgbot: andrew synchronized php-1.5/extensions/NewUserMessage/NewUserMessage.class.php 'Deploy r65295'
10:27 ^demon: Running 'svnadmin dump --deltas' in a screen on Mayflower. Writing to storage2 Tomasz mount�ed at /home/demon/svn. Already to ~r30000
April 19
23:14 atglenn: one more time with elwiki, cswiki and ltwikt worker threads (shooting parent processes as well this time) on snapshot3
23:08 domas: started slow query killer on db9 - it was constantly alerting in #wikimedia-tech
22:32 atglenn: killed fetchText.php on snapshot3 for ltwikt, cswiki, thwiki, all hung on writes (nfs issue ? they were stuck from April 16 on... coincidentally when cname change went out for download.wm -> dumps.wm) Dumps seem to be running normally now
19:56 Tim: upgraded libavcodec and libavformat on image_scalers node group due to USN-931-1
00:18 Fred: made room on McHenry to resume backups from lily and sanger.
April 18
07:37 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '21243 - Request for PDF Version & Page collection'
07:27 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php ' 23218 - Add CAT: as namespace alias for Category: on enwikinews'
April 16
23:02 JeLuF: 16412 - Create a redirect from hu.wikimedia.org to http://wiki.media.hu
21:24 JeLuF: running cleanupTitles.php on *.wiktionary.org
21:01 JeLuF: updated site_stats:ss_images on plwiki
20:48 JeLuF: Redirect download.wikipedia.org and other aliases to dumps.wikimedia.org
20:07 JeLuF: 17008 - Redirecting from eo.wikisource.org
19:37 JeLuF: enabled on-demand SVG compression on upload.wikimedia.org
20:04 logmsgbot: ariel synchronized php-1.5/wmf-config/InitialiseSettings.php 'Add group IP to wgRateLimitsExcludedIPs for a wp for new users conf. To be removed Apr 20 2010'
18:27 atglenn: (yesterday's news) dataset1 back up with smaller ext3 partition replacing the xfs one, for testing
15:20 RoanKattouw: Deleting about 10k redundant redirects on mlwiki using deleteBatch.php
14:29 logmsgbot: catrope synchronized php-1.5/includes/EditPage.php 'r64882 for bug 23139'
12:52 mark: Set up BGP prefix advertisements through summarization with aggregate addresses, instead of iBGP announcement of the summary addresses of AS43821
03:27 logmsgbot: tstarling synchronized php-1.5/includes/specials/SpecialUpload.php 'r64847: fix $1 in uploadtext'
02:44 logmsgbot: tstarling synchronized php-1.5/includes/specials/SpecialContributions.php 'r64845, diff/hist link swap'
01:27 Tim: running cleanupTitles.php on ml wikis (starting with ml.wiktionary)
April 9
22:35 logmsgbot: catrope synchronized php-1.5/includes/parser/Parser.php 'r64839 Language converter bugfix'
22:32 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Set initialCaptial to true for Commons ForeignFileRepo setups. Chad made me do it'
18:22 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '23077 - Extension:Collection for English Wikiversity'
18:19 logmsgbot: jeluf synchronized php-1.5/wmf-config/CommonSettings.php '23081 - Remove "makesysop" right from bureaucrats and stewards groups'
17:42 mark: Implemented OSPFv3 on AS43821
17:00 Rob: db31-db40 racked and lom setup (except db34 which has an unresponsive bmc fresh from factory.
16:10 mark: Rebooted br1-knams after it got confused about OSPF area 0 vs 0.0.0.0
15:10 mark: Changed OSPF reference bandwidth from 100 Mbps to 100 Gbps on all AS43821 routers
06:25 logmsgbot: kate synchronized php-1.5/wmf-config/db.php
06:25 river: removing lomaria to dump s5 for TS
06:22 logmsgbot: kate synchronized php-1.5/wmf-config/db.php
06:22 river: removing db30 to dump s2 for TS
05:53 The_Thing: Can I test this?
02:59 logmsgbot: andrew synchronized php-1.5/includes/media/Bitmap.php
02:58 logmsgbot: andrew synchronized php-1.5/includes/DefaultSettings.php
02:58 Andrew: Seems to work correctly, deploying fix to all wikis.
02:55 Andrew: Deploying r64727 on testwiki for testing GIF scaling fixes.
01:43 Tim: on kaulen: reset the root mysql password and saved it in /root/.my.cnf for future reference
01:05 Tim: installing PHP on kaulen and various MW dependencies
April 7
21:56 logmsgbot: andrew synchronized php-1.5/wmf-config/InitialiseSettings.php
21:55 Andrew: Sticking LiquidThreads on officewiki, by Erik's request.
20:52 Fred: ganglia shut down for a couple minutes while I grow the tmpfs.
18:13 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Change PrefStats tracking for wikis with Vector/toolbar enabled by default'
19:55 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Disable collapsiblenav explicitly in wake of usability deployment'
18:19 RoanKattouw: Manually corrected time on prototype, was 6 hours behind. Guess we need NTP there too
16:56 Rob: rebooted srv119
16:31 Rob: srv89 had temp warnings and failure to respond to console, replaced two bad fans in chassis with fans from srv127
16:22 Rob: srv98 was unresponsive to console, rebooted and appears to be back online and working just fine for now.
16:20 Rob: srv127 decomissioned. dead hard disks, dead fans, temp warnings, no lom, and out of warranty. will be parted out to other servers
16:08 Rob: srv127 throwing both hard disk errors and temp warnings
15:00 Rob: db28 is not in any current use to to memory issue, investigating memory issue now for hardware warranty replacement
05:33 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php 'Tim says this helps'
05:02 logmsgbot: jeluf ran sync-common-all
04:59 logmsgbot: jeluf ran sync-common-all
00:50 Tim: increased apache TimeOut setting on singer from 5 to 30 seconds, to hopefully fix multiple reports of high error rates on secure.wikimedia.org
March 30
20:18 Rob: new server gilman racked for future secure gateway use and dns updated for server fqdn
20:17 Rob: back to work morebots
17:22 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '22650 - bureaucrats should only be allowed to remove abusefilter and bot, not the other stuff'
09:21 logmsgbot: andrew synchronized php-1.5/wmf-config/CommonSettings.php 'Bump style version'
09:20 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/lqt.css 'Deploy r64384'
09:20 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads_alpha/lqt.css 'Deploy r64384'
14:29 RoanKattouw: prototype.wikimedia.org VM seems to have crashed, prototype down
March 28
21:25 logmsgbot: andrew synchronized php-1.5/wmf-config/CommonSettings.php 'Add a new blocked email address'
18:49 RoanKattouw: Ganglia monitoring of Flordia Squids broken, reports all boxes down
16:33 apergos: upped maxclients and turned keepalive back on on singer (these were changed last week during the site outage)
March 27
18:21 JeLuF: upgraded ViewVC to 1.1.4
07:51 JeLuF: roled out new apache config with a catch-all for www.*.(wikisource|wiktionary|wikiversity).org and www.(meta|commons).wikimedia.org, see bug 1698
18:37 river: delegated 1.24.10.in-addr.arpa to ha-dns-auth.toolserver.org
18:17 Rob: techdetail: lowered maxclients on singer apache2 and disabled secure site in apache configuration, do not rollback changes to normal until after esams is back online
18:16 Rob: techlbog back up, secure is NOT up, and is not expected to come back up, until the esams outage is resolved
18:01 Rob: disabled secure on singer, as everyone trying to use it overloaded both secure and blogs.
19:22 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Add vector-collapsiblenav as a hiddenpref'
19:21 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php 'removed more group permission for closed wikis'
18:34 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php 'Remove all simplewikibooks configs from Initialisesettings'
18:32 logmsgbot: jeluf synchronized php-1.5/wmf-config/CommonSettings.php 'REVERT Test case for simplewikibooks'
18:29 logmsgbot: jeluf synchronized php-1.5/wmf-config/CommonSettings.php 'Test case for simplewikibooks'
18:28 Rob: sq34 & sq35 coming down for reinstallation
18:23 logmsgbot: jeluf synchronized php-1.5/wmf-config/CommonSettings.php 'REVERT Test case for simplewikibooks'
18:22 logmsgbot: jeluf synchronized php-1.5/wmf-config/CommonSettings.php 'Test case for simplewikibooks'
18:19 Rob: sq33 is reinstalled and doing quite fine.
16:02 Rob: puppet stopped on sq34-sq50, DO NOT RESTART PUPPET. These hosts need reinstallation before puppet can be restarted, or squid changes can be deployed to them.
15:52 Rob: changed the setting for pmtpa to change partitioning schema and up the cache size. Do not deploy old squids with this information until Rob finishes reinstalling them
15:52 Rob: changed squid config in text squids for hard disks update on reinstalls
12:01 logmsgbot: jeluf ran sync-common-all
11:59 logmsgbot: jeluf synchronized closed.dblist
March 22
21:02 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '#21213 - logo for ace wikipedia'
12:14 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php 'bgwiki group permissions fixed'
11:37 logmsgbot: jeluf synchronized closed.dblist
11:04 JeLuF: removed tokipona wikis from all.dblist
11:02 logmsgbot: jeluf ran sync-common-all
10:25 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '#22446 - Change the WT namespace to an alias for Wiktionary, for Swedish Wiktionary'
10:03 logmsgbot: jeluf ran sync-common-all
09:53 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php 'UNDO: 22446 - Change the WT namespace to an alias for Wiktionary, for Swedish Wiktionary - pages still exist in these namespaces'
09:40 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '#16857 - Request to switch over logo for sanskrit language wikipedia localisation'
09:26 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '#22329 - Redirect WV to Wikiversity space for English Wikiversity'
09:22 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '#22446 - Change the WT namespace to an alias for Wiktionary, for Swedish Wiktionary'
09:13 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '22481 - Namespace alias at mwl.wikipedia.org'
09:06 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '22481 - Namespace alias at mwl.wikipedia.org'
06:57 JeLuF: bugzilla.wikimedia.org
06:57 JeLuF: Changed kaulen config to include ServerAlias bugzilla.wikipedia.org for
March 20
21:21 logmsgbot: aaron synchronized php-1.5/wmf-config/flaggedrevs.php 'removed sysop access to stabilization except on wikis already have config rows. This page should not be enabled by default.'
19:49 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php 'Bug 22549 - Enable email on watchlist change on fr.WV'
14:53 apergos: tossed a few files from tridge to make space, a very short-term solution
14:13 JeLuF: Bug 22681: migrated users on bgwiki dawiki enwiktionary hewiki hrwiki itwiktionary from groups autopatrol and autopatroller to group autopatrolled
16:14 Fred: security updates of apaches in progress
06:22 logmsgbot: tstarling synchronized php-1.5/includes/StubObject.php 'reverted profiling hack, I assume whatever information you wanted to get out of this, you've got now'
00:33 logmsgbot: tfinc synchronized php-1.5/wmf-config/InitialiseSettings.php 'Adding override for strategy and usability wiki for wmgNoticeProject'
00:05 logmsgbot: tfinc synchronized php-1.5/wmf-config/InitialiseSettings.php 'Changing mapping of commosn under wmgNoticeProject to just the string commons'
00:03 logmsgbot: tfinc synchronized php-1.5/wmf-config/InitialiseSettings.php 'Commenting out commons from wmgNoticeProject to fix loading'
04:52 logmsgbot: andrew synchronized php-1.5/wmf-config/InitialiseSettings.php 'Make Vector and usability enhancements default on liquidthreads_labswikimedia'
04:46 logmsgbot: andrew synchronized php-1.5/wmf-config/CommonSettings.php 'Bumping style version'
04:43 Andrew: Scapping to deploy LiquidThreads alpha update
04:40 Andrew: Preparing for LiquidThreads alpha update to trunk state
March 13
18:42 Rob: sanger is back online and working
18:33 Rob: sanger moved, coming back online
18:21 Rob: thats sanger, not singer.
18:21 Rob: taking down singer to move it up in the rack, during this downtime (approx 15 min), wikimedia.org email will be unavailable.
18:20 Rob: mchenry is back online and dealing with its 10 minute mail backlog.
18:14 Rob: mchenry moved, bringing it back online
18:01 Rob: mchenry host coming down to move its rack location. Downtime will be approx. 15 minutes. (all mailing list relays may not work and thus new mailing list entries will back up until this move is completed.)
March 12
19:23 logmsgbot: fvassard synchronized php-1.5/wmf-config/CommonSettings.php 'Enabling .odg extension for uploads'
18:49 Fred: moved stats.w.o from Zwinger to Spence
12:00 RoanKattouw: storage2 (download.wikimedia.org) was down between 11:40 and ~11:53 UTC, coincides with load spike
07:43 apergos: really edited the right copy of worker.py this time (the copy in /backups :-P)
06:19 apergos: editd worker.py in place on snapshot3 to remove --opt and replace all except lock options individually (it was running with that on db12 again, with high lag). shot that worker process, lag and thread count going down now
01:32 logmsgbot: tstarling synchronized php-1.5/wmf-config/db.php 'brought db26 into rotation as enwiki watchlist/contribs server, to replace db12 after cache warming'
01:30 Tim: network spike on db12 at 01:00. Going to depool it in case it's going to try the same trick as last time
18:54 Fred: removed duplicate ACL for 208.80.152.157 from text-squids
March 9
23:32 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/classes/View.php 'Merge r63523 into LiquidThreads production'
23:26 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads_alpha/classes/View.php 'Merge r63524 into LiquidThreads alpha'
23:26 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads_alpha/classes/View.php 'Merge r63523 into LiquidThreads production'
23:24 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads_alpha/classes/View.php 'Merge r63524 into LiquidThreads alpha'
23:24 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads_alpha/classes/View.php 'Merge r63523 into LiquidThreads production'
23:15 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/classes/View.php 'Merge r63154 into LiquidThreads production'
23:13 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads_alpha/classes/View.php 'Merge r63154 into LiquidThreads alpha'
20:22 rainman-sr: this morning's search api outage seems to be connected with index update on search8 which triggered high GC load and stuck threads, removing enwp spell index from search8
15:37 hcatlin: message before was referring to mobile site.
15:37 hcatlin: deployed new languages. cs homepages.
14:58 domas: forgot to log - yesterday's API fail was repeated today too, all API nodes were blocked on talking to lucene, and search cluster was idle too, and...
13:48 logmsgbot: catrope synchronized php-1.5/languages/messages/MessagesNds_nl.php 'Fix message that didn't get picked up'
13:46 logmsgbot: catrope synchronized php-1.5/languages/messages/MessagesEt.php 'See if this works'
13:20 RoanKattouw: Rebuilding all l10n caches to prep debugging a LocalisationUpdate issue
13:16 domas: something was wrong with sq31/sq32 - cpu/network shot through roof, can't correlate load
10:42 domas: *sigh*, squid conf: added highPrecedence to defaults, so it is picked up by configuration builder. also, checked in files into RCS (had to take over locks from mr.Starling, sorry!). unbroke test.wikipedia API
09:43 domas: where did test api,php fix go?!
04:38 Fred: removed Bart's ACL in squid cache conf since it was a duplicate ACL.
March 6
22:59 mark: Increased membufs on all backend squids
06:58 logmsgbot: aaron synchronized php-1.5/wmf-config/InitialiseSettings.php 'disabled FlaggedRevs_alpha due to outdated messages issues'
05:40 logmsgbot: aaron synchronized php-1.5/wmf-config/flaggedrevs.php '$wgFlaggedRevsStylePath for labs'
05:33 logmsgbot: aaron synchronized php-1.5/wmf-config/flaggedrevs.php 'Set labs site to use flaggedrevs_alpha'
05:26 logmsgbot: aaron synchronized php-1.5/wmf-config/InitialiseSettings.php 'Set labs site to use flaggedrevs_alpha'
05:10 logmsgbot: aaron ran sync-common-all
00:18 logmsgbot: root synchronized php-1.5/wmf-config/InitialiseSettings.php 'Bug 22728 - create new user group on Greek Wiktionary'
March 5
18:48 Fred: removed duplicate ACL from squid backend config.
18:11 Rob: restarted npre daemon on ms1 since it wasnt transmitting to nagios.
00:51 logmsgbot: aaron ran sync-common-all
00:23 logmsgbot: mark synchronized php-1.5/wmf-config/InitialiseSettings.php 'Restore Monobook for nl.wikimedia.org'
March 4
19:44 Tim: installed NRPE and RAID utils on ms1
19:40 Tim: removed db19 from the mysql node group and nagios since it is being used for varnish now
19:29 Tim: fixed prototype.wikimedia.org (nagios critical), was waiting 10s on every request due to memcached being down
March 3
16:16 apergos: reenables gmetric on ms8 a little bit ago, scrub finished with no errors (yeah right(
06:06 apergos: replication off again tonight, zfs nicely borked on ms8, zfs destroy snapshot X gives message "cannot destroy X, dataset already exists" . time for sun contract and call them
05:53 apergos: gmetrics turned off again on ms8, running zpool scrub export
05:40 apergos: reset SYs on ms8 worked (slowly). gmetrics reenabled. (and I thought morebots was just being a bad bot)
05:39 apergos: ( :-P ) starting catchup for the last day's lost replication ms7 -> ms8
05:22 apergos: reboot failed, bootadm hung probably waiting on zfs. stop SYS claims to work but subsequent show SYS shows power state on.
05:14 apergos: rebootiing ms8 to clear zfs deadlock
14:38 RoanKattouw: logmsgbot is missing in action AGAIN, starting it as me on fenari
14:37 RoanKattouw: Synced r63008
12:17 domas: *sigh*, recovered locke after OOM, added few entries to limits.conf
February 25
23:48 Tim: running the new fixBug20757.php on all wikis
21:48 RoanKattouw: Removed empty line from the end of all.dblist, broke account creation ^^
20:18 RoanKattouw: Removed those closed wikis again, db lists back to normal
20:08 RoanKattouw: Temp adding some closed wikis to all.dblist to be able to dump them (again)
19:01 Rob: removed old servers will, bart, etc from DNS
18:32 Rob: pushing more dns changes for LOM on new servers.
00:59 Tim: made the squid purge script on locke also purge redirects from /w/index.php?, for bug 22639
00:32 Tim: installed ack-grep on fenari
February 24
20:48 Rob: IP conflicts on alsted/ersch/formey/kaulen. I must have typo'd something in one of their DRAC configurations because they are all messed up. will go into DC and physically review settings via console.
16:10 Rob: had formey and alsted ip info mixed up in dns, updating dns
13:13 RoanKattouw: Adding optin_survey table to wikis that didn't have it
01:38 atglenn: ganglia 3.1 packages up on ms7 and ms8 (see atg for solaris packages for other boxen)
01:20 domas: that was fun, let's do it again!
01:17 mark: Implemented udp2log filter on locke that pipes redirected URLs to a modified version of purgeList to clean the squid caches
00:59 Rob: added austin to the mortals user category for shell access on puppet, updating singer's account list
00:30 mark: Fixing wikipedia.be vhost ServerAlias setting in redirects.conf
00:14 domas: redefined tiertwo acl in squid to allow fenari/zwinger purges, purged special:random, restarted mobile1 stuff
February 23
23:41 domas: stopped nginx on en.m until we resolve special:random issue
20:53 logmsgbot: root synchronized php-1.5/wmf-config/InitialiseSettings.php 'Bug 22628 - Logo of Chamorro Wikipedia'
17:23 Rob: drac setup on new servers ersch, formey, alsted, and kaulen. still need port allocation.
16:47 Rob: forgot to add mgmt, updating dns
16:43 Rob: updating dns for new servers ersch, formey, alsted, & kaulen
15:37 Rob: FINALLY got in the last RMA drive for storage1, arrays were toast (losing 1/2 the drives in a raid10 and having the unlucky chance of them being paired), reinstall in progress
12:50 RoanKattouw: Cleaned up my dblist voodoo, everything should be back to normal now
12:41 RoanKattouw: Fixed sync-dblist, was broken
12:31 RoanKattouw: Syncing pmtpa.dblist anyway, seems I have to
12:24 RoanKattouw: Temporarily adding wikis listed in bug 20325 to pmtpa.dblist so I can export their contents. Please don't sync this
February 22
23:40 Tim: restarted replication on ms2
23:39 Tim: ms1 is up with a full copy of ms3 data, now replicating cluster22/rc1
21:09 logmsgbot: mark synchronized php-1.5/wmf-config/InitialiseSettings.php 'Revert'
20:30 logmsgbot: root synchronized php-1.5/wmf-config/InitialiseSettings.php 'Bug 22522 - Allow bureaucrats to remove sysop flag on Simple English Wikitionary'
20:09 Rob: finished the setup of srv217 (just needed puppet fixed and files updated)
20:02 logmsgbot: root ran sync-common-all
20:01 logmsgbot: mark synchronized php-1.5/wmf-config/InitialiseSettings.php 'Set vector as default skin on nlwikimedia'
19:44 logmsgbot: root ran sync-common-all
18:10 Fred: ganglia is going to hiccup a little for the next few minutes while I restore data... "Don't panic"
07:53 Tim: rebooted srv206 via drac
07:48 logmsgbot: tstarling synchronized php-1.5/wmf-config/mc.php 'removing srv206, is down'
06:19 Tim: increased multicast TTL from 1 to 3 in gmond_template.erb, to fix ganglia on ms*
05:54 Tim: started ms2 -> ms1 copy using nc+tarpipe
05:30 Tim: shutting down mysql on ms2 for copy to ms1
05:09 logmsgbot: tstarling synchronized php-1.5/wmf-config/db.php 'depooling ms2 for copy to ms1'
February 21
02:43 Fred: puppettized user gmetric
February 20
22:37 mark: Installed Ubuntu 8.04 on ms1 - now ready for ES deployment
February 19
18:48 Fred: removed memcached node srv193 and replaced with srv233
16:52 Rob: updated InitiliseSettings with bug 22585
01:31 Fred: puppetized gmond.conf generation using templates
February 18
23:51 RoanKattouw: Fixed search script on yongle, removed UA check exemption for yongle
23:51 RoanKattouw: logmsgbot is broken again
04:07 Tim: re-running storageTypeStats.php on enwiki to identify rows with old_flags='object,utf-8', which need to be handled properly in fixBug20757.php
February 17
17:25 Rob: pushed updates to wordpress installations
17:04 RoanKattouw: Synced wmf-config/checkers.php 'Allow missing UA from yongle'
17:04 RoanKattouw: Restarted logmsgbot
16:47 Fred: test
07:24 Fred: there is a problem with ganglia. It is being worked on.
00:15 RoanKattouw: Resynced srv151, was returning empty responses
February 16
23:31 Fred: upgrading gmond to 3.1.2 everywhere. However, due to the newish module structure, there is a potential that ganglia will hickup while puppet does its job...
22:03 logmsgbot_: mark synchronized php-1.5/wmf-config/checkers.php 'Exceptions'
20:30 Rob: srv127 is online, but not in LVS. Its cert was accepted on sockpuppet, but puppetd --test results in a cert failure on retrieval from sockpuppet.
20:17 Rob: rebooting srv127
20:15 mark: Increased membufs to 40 per COSS dir on the pmtpa upload backend squids
19:51 mark: Increased membufs per COSS dir from 10 to 20 on the new pmtpa squids
18:10 apergos: but documentation can save people precious time when things are on fire
18:02 mark: Documentation is not a substitution for thinking
16:44 mark: Fixed puppet on most servers
16:23 domas: anyone knows why mysqldump on snapshot3 is locking tables? maybe --single-transaction could work better?!!!?
16:22 RoanKattouw: Strike my last, I hear it'll fix itself in a day
16:21 RoanKattouw: Oops, meant to type srv187-189
16:21 RoanKattouw: Ganglia not picking up data for srv1987-189, 193, 214-218, 250-253, 257 even though the boxes are up and gmond is running; been like this for 3 days
15:21 mark: Removing all puppet certs/private keys on all machines
14:55 mark: Puppetmaster is screwed up by a wrong command that deleted all files under /var/lib/puppet
14:25 logmsgbot_: midom synchronized php-1.5/includes/diff/DifferenceInterface.php 'instrumenting costs by revision pairs'
13:46 logmsgbot_: root synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 22444 and 22330 enabling collection on dawiki and ruwiki'
13:22 domas: oh wait, it does work, something else....
13:16 domas: apparently my squid rule for Apple browsers didn't really work, hence the continuing spikes
03:16 domas: blocked MacOSX Atom syndication on the edge, this will save terabytes of diskspace on unsuspecting Safari user computers :-)
19:15 Rob: sq67 has both disks installed now (had to get one replaced) and has no OS installation yet. holding off on install, not sure if mark wants this up as a squid or for varnish development
18:22 atglenn: resstarting transfer of data (one chunk) from ms8->ms5->ms6, running in screen as root on all three hosts using nc
16:18 Rob: setup project1 and project2 boxes for flaggedrevsdevelopment and project mangement suite testing
15:12 mark: Upgraded our AMS-IX connection from 2x 1G to 10G along the way
15:12 mark: Reloaded br1-knams twice to fix a CAM partitioning problem; not sufficient IP next hops for 8-path routes
11:34 mark: Filtering all prefixes to/from AMS-IX peers
02:04 Tim: installed openssh-server on bayes, Erik Z apparently uninstalled it with aptitude
01:37 logmsgbot: tstarling synchronized php-1.5/wmf-config/CommonSettings.php 'disabled WikiEditor due to complaints about bug 22428'
01:06 atglenn: shoveling over some data from ms8 to ms6 via ms5 :-/
00:48 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/lqt.js 'Disable live preview entirely, broken by CentralNotice and DismissableSiteNotice'
00:37 logmsgbot: andrew synchronized php-1.5/wmf-config/InitialiseSettings.php 'Disable DismissableSiteNotice for lqt.labs, breaks LiquidThreads live preview'
00:17 logmsgbot: andrew synchronized php-1.5/wmf-config/CommonSettings.php 'Bump style version'
00:07 Andrew: Running scap
00:02 Andrew: Updating LiquidThreads alpha to trunk state, deploying with scap in a few minutes
February 11
22:06 Fred: implemented disk space check for tridge
21:10 mark: Pooled sq71-85 with full CARP weight (30)
21:06 mark: Pooled sq79-85 frontends in LVS with full load (30)
20:59 mark: Pooled sq79-85 frontends in LVS with load 1
20:51 mark: Pooling sq79-85 in CARP on all frontends, with low CARP weight 10 instead of 30
20:30 mark: Pooling sq79-85 in CARP on frontend sq51 to seed the cache
19:39 atglenn: replication every 15 minutes enabled from ms7 to ms8
19:31 mark: Removed sq16-30 from the text frontend squid pool in LVS - these servers will soon be decommissioned
19:28 mark: Pooled sq71-78 frontend squids in LVS with full load (30)
19:21 mark: Pooled sq71-78 frontend squids in LVS with low load (1) to seed caches
18:30 mark: Pooling sq71-78 backend squids on all frontends, with lower CARP weight (10 instead of 30) to seed the caches
18:04 mark: Deployed new squid config to sq58-66 frontend squid, to seed the caches of backend squids sq71-78
17:27 mark: Deployed new squid config to sq66 frontend squid, to seed the caches of backend squids sq71-78
17:24 mark: Deployed Squid with correct configuration on sq71-78
11:08 RoanKattouw: Load spike on 4CPU Apaches ended, rr.knams back up, downtime seems to be over
11:05 RoanKattouw: Another load spike on the 8 CPU Apaches, approx 11:02-11:-05 UTC
11:00 RoanKattouw: Apache load spike started around 10:42 UTC and coincides with a load dip followed by a monitoring blackout on the Kennisnet Squids and a load spike on the Tampa Squids
10:58 RoanKattouw: 8CPU Apaches stopped going bonkers all of a sudden according to Ganglia, 4CPU ones still have high CPU usage
10:56 RoanKattouw: Site went down, all Apaches have high CPU usage
February 10
22:31 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Bump style version appendix to deploy usability changes sitewide'
22:22 logmsgbot: catrope synchronized php-1.5/extensions/UsabilityInitiative/js/plugins.combined.min.js 'Deploy r62275 to test'
21:28 logmsgbot: catrope synchronized php-1.5/extensions/UsabilityInitiative/js/plugins.combined.min.js 'Deploy r62264 for real. Running svn up first helps'
19:05 Rob: storage1 disks replaced, except for the newly dead one, rma placed.
18:39 Rob: shutting down storage1 to replace its bad OS mirror disk
16:52 mark: Fixed MySQL permissions on srv186 as well
16:15 mark: Rigged puppet to deploy a PHP mail.ini file to the apaches, which sets it to call sendmail with -f <> (sending mail with empty envelope sender, like bounces)
February 9
22:06 Andrew: LiquidThreads production software updates completed successfully.
21:58 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/classes/Dispatch.php
21:57 mark: Firewalled one ip on singer which was overloading Apache, looked like a Wordpress attack
21:50 mark: Lowered MaxClients setting to 250 on singer
21:47 Andrew: Running scap to deploy LiquidThreads updates
21:45 Andrew: Updating LiquidThreads production deployments to the alpha version in the next few minutes.
21:44 mark: Power cycled singer
21:19 mark: Fixed MySQL system user & permissions on srv151-srv185
21:17 mark: Fixed MySQL instance on srv183
20:47 mark: Fixed MySQL instance on srv156
19:39 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Bump style version appendix'
19:29 logmsgbot: catrope synchronized php-1.5/skins/common/edit.js 'Deploy r62190. Only for test for now, thank God for style versions'
17:01 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Bump style version appendix'
02:14 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads_alpha/lqt.js 'Merge r62158'
February 8
23:25 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Bump style version appendix for r62041. Varnish needs this or Tampa will serve stale JS off bits'
23:06 mark: Deploying Exim on all application servers as well
06:18 Tim: cleared out db9 root partition (which had 370MB free) by moving 4GB of SQL files from /root and /home/tfinc to /a/backup/junk
06:01 logmsgbot: tstarling synchronized php-1.5/wmf-config/db.php 're-added ms2 as an ES slave for rc1,cluster22 and the ex-fedora clusters, was depooled for ~6 months'
05:55 Tim: added grant for root@fenari to ms3/2
05:31 Tim: started apache on srv213, srv161 and srv240, was stopped for no apparent reason
04:52 logmsgbot: andrew synchronized php-1.5/includes/ChangesList.php 'Merge r62117'
04:37 Andrew: Deploying LiquidThreads alpha updates with scap
04:23 Andrew: Planning to deploy LiquidThreads alpha (liquidthreads.labs and test) in the next few minutes
01:15 Tim: added mysql and gmetric users on srv155 temporarily, to get those services back up
00:42 Tim: ran apt-get upgrade on srv155. Doing reboot test for new kernel.
February 7
23:36 Andrew: Clarification: Somebody should fix it, not I think it's fixed now.
23:35 Andrew: The issue of centralnotice pulling notices from the wrong place should be fixed now, though
23:34 Andrew: Moved /mnt/upload5/centralnotice/centralnotice.js and /mnt/upload5/centralnotice/wikipedia/centralnotice.js to centralnotice-old.js, should stop fundraising banners from appearing.
06:00 logmsgbot: andrew synchronized php-1.5/wmf-config/liquidthreads.php 'Re-activate LiquidThreads email notifications, DoS has passed'
05:22 Tim: freed up a little space on ms3 by reducing expire_logs_days from 14 to 7, and running flush logs
02:45 apergos: cleared out /tmp on srv169...
00:23 logmsgbot: catrope synchronized php-1.5/extensions/UsabilityInitiative/UsabilityInitiative.hooks.php 'Bump style version for jQuery UI stylesheet'
00:06 logmsgbot: catrope synchronized php-1.5/extensions/UsabilityInitiative/UsabilityInitiative.hooks.php 'Bump style version for combined.min.css'
00:03 logmsgbot: catrope synchronized php-1.5/extensions/UsabilityInitiative/css/combined.min.css 'Resync this, doesn't seem to have been picked up right'
February 4
23:55 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Bump style version appendix'
23:54 RoanKattouw: Deploying r61959 using individual sync-files
22:35 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Explicitly enable EditWarning by default here. Previously done in extension file'
17:32 mark: Set root password on all servers using Puppet
17:26 Rob: loaded sq68 full with 6 ssd, had to 'borrow' drives from the unracked squids in pmtpa awaiting the rail issue resolution.
17:26 Rob: connected sq67 and sq68 to secondary network ports as requested
16:43 Rob: had to update gmetad and add sq14 and sq17 into puppet as ganglia aggregator hosts. Currently not showing in ganglia, but should when puppet updates their files shortly.
16:37 Rob: updated gmetad_pmtpa in NFS store, as well as on zwinger and spence.
16:20 Rob: !log cleaned sq1-sq10 certificates off sockpuppet
16:10 mark: Rebooting sockpuppet for security upgrades
16:06 Rob: updating DNS to remove lvs3.wikimedia.org, as its technically lvs3.pmtpa.wmnet
16:03 Rob: removed sq1-sq10 in dsh groups, server roles, pybal (already done, just doublechecked), and now starting wipe and removing their network connections.
15:37 Rob: updated video plugin on techblog, had to comment out sumotv support line due to errors.
13:30 mark: Removed sq1-10 from the upload backend Squid pool, preparing for decommissioning
12:27 mark: Removed eiximenis from the text backend pool
12:21 mark: Restarted sq51 with hyperthreading disabled
11:31 Tim-away: exim was flooding its mainlog with errors like "User 'exim' has exceeded the 'max_user_connections' resource (current value: 150)". OTRS is apparently broken as a result. Set the limit to infinity to fix it.
11:10 Tim-away: debugging mysql connection errors from mchenry to db9
10:09 RoanKattouw: Deploying r61919 with individual sync-files
07:45 Tim: killed eximstats on mchenry, was sending the machine into swap
06:02 Andrew: Running scap to deploy LiquidThreads alpha updates
05:09 Andrew: Planning to update LiquidThreads alpha to trunk state in the next few minutes.
03:31 Tim: removing refreshLinks2 jobs from the enwiki job queue with namespace=0, 100k of them put there by a nasty biography template
00:34 domas: restarted varnish on db19
February 2
23:26 mark: PyBal got into a confused state due to a duplicate LVS realserver entry. Set up LVS manually/statically for 30 mins to debug the problem. PyBal is now active again.
23:04 Tim: pmtpa text squids down
22:33 mark: Pooled sq59-68 in LVS (Text)
22:27 apergos: restarted apache on srv170 (corrupted apc cache)
22:25 mark: Increased CARP weight of the new Upload squids sq51-58 from 10 to 30
22:17 mark: Added sq59-66 to the Text pool of Squids, with low CARP weight (10) to seed the caches
21:44 logmsgbot: andrew synchronized php-1.5/wmf-config/liquidthreads.php
21:42 Andrew: temporarily disabling $wgLqtEnotif to mitigate mail server DoS impact on LiquidThreads post/reply speed.
21:42 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads_alpha/classes/NewMessagesController.php 'Deploy r61879'
21:42 logmsgbot: andrew synchronized php-1.5/extensions/LiquidThreads/classes/NewMessagesController.php 'Deploy r61879'
21:04 mark: Removed sq2-10 from the LVS pool
20:46 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Bump style version addition for usability alpha'
22:12 apergos: restarted morebots, must have died after last night's freenode irc move
04:40 Tim: fixed broken /etc/rc.local on wikitech
04:18 Tim: restarting mysqld on db12
04:00 Tim: analysis: mysqld on db12 hit a bug at 02:45 and froze, most threads in futex. The mysql client failed to set a read timeout, leading to db12 sucking up all available apache threads. Several squids became overloaded, presumably due to the large size of our 503 error messages.
03:21 Tim: restarted all apaches, thereby killing the reads which started before 3:11, possibly as early as 02:45. Site service resumes immediately. ps -lL on db12: http://p.defau.lt/?Jw1EBE0fnV4Rpxe9OP8yAw
03:18 Tim: finally thought to answer the question "are the apaches waiting for something or idle?" Strace confirms that they are waiting for db12 in read().
03:11 Tim: Having trouble believing db12 could be responsible for the CPU spike on the squids. Depooled db12 just in case.
02:56 Andrew phones Tim
02:45 retrolog: db12 ganglia graphs go flat. Apache CPU goes down to near zero. CPU saturates on some of the text squids: sq16, sq17, sq18, sq21, sq22, sq23, sq25, sq26. Frontend squids serve 503s.
02:00 retrolog: Network spike on db12
January 29
23:21 domas: restarted sq37 for being in bad shape.
22:25 tomaszf: starting xml snapshots back up on snapshots3
21:31 atglenn: cleared out cruft from /tmp on srv158 (how about a cron job? :-P )
19:05 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 22201: Add WZ: alias on itwiktionary'
15:22 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'Enable $wgEnotifWatchlist on usabilitywiki'
11:38 mark: Deployed new gmond 3.0.3-2 package that conflicts with the incompatible ganglia 3.1 packages in Ubuntu, to avoid problems in our upcoming upgrade
January 21
14:13 mark: Added srv210 to LVS
14:11 mark: Reenabled srv245 in LVS
13:58 mark: Restarted Apache on srv207, srv210, srv233, srv257, srv245
09:30 mark: Replaced srv121 (decommissioned) by search13 in smokeping
07:59 AaronSchulz: Removed empty ct_tag rows from code_tags
07:47 Tim: cleaned up /tmp on srv180 (was out of disk space)
16:38 Fred: killed long running queries created by civicrm on db9.
13:56 domas: did lots of changes on wikitech wiki. :)
02:37 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'Disable API logging on enwiki'
01:20 logmsgbot: tfinc synchronized php-1.5/wmf-config/InitialiseSettings.php 'Enabling api logging for enwiki'
01:16 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'Disabling API request logging on dewiki'
00:03 RoanKattouw: Strike my last, was actually *enabling* it on dewiki
00:03 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'Disable API logging test on dewiki'
00:02 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'Disable API logging test on frwiki'
January 19
21:37 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'Enabling API request logging on frwiki'
21:36 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Enabling API request logging on frwiki'
20:34 Fred: bugzilla upgrade completed. It might be up to an hour before attachments can be added due to DNS propagation.
19:41 Fred: Bugzilla as you know it is going down. Be back shortly.
18:21 Rob: updated blog.wikimedia.org wp-stats plugin to 1.6.1
16:03 Rob: ssds installed in db28, during boot it displays a memory error, investigating.
15:20 Rob: db28 offline while disks are swapped to solid state for testing.
January 18
05:02 Tim: removed some old xff logs to free up space for logrotate which I ran with the -f flag
05:00 Tim: fixed logrotate on nfs1, broken due to duplicated entries between /etc/logrotate.d/rsyslog and /etc/logrotate.d/syslog-ng. Left the syslog-ng ones in for now.
04:02 RoanKattouw: NFS /home is full
January 16
14:55 mark: Started backend squid on sq50
14:26 mark: sq19 has bad drive /dev/sdc
14:22 mark: dist-upgrade & reboot on sq19
14:14 mark: Shutdown sq20, bad disk /dev/sda
14:13 mark: Reenabled sq24 frontend in PyBal
14:03 Tim-away: disk space critical on srv167, cleaned up /tmp
06:07 apergos: "recovery", my *ss... cleaned up /tmp on srv181 to get some space back
January 15
23:53 tomaszf: starting test dewiki snapshot on snapshot2
17:17 Fred: modified gmetad config on zwinger and spence to reflect new apache 4cpu aggregator
17:16 Fred: added srv149 as a gmond aggregator in puppet.
January 14
17:28 domas: load-tested and fixed db19 to handle full bits workload (~22k/s), now again serving just tampa part.
01:32 apergos: moved bits to .2 again, all nameservers seem to be up and reflect the change (bits unresponsive again)
January 13
18:30 Andrew: [andrew@zwinger ~]$ sync-file wmf-config/CommonSettings.php 'Disable GIF scaling again, due to issues reported on village pump, bug 22041'
17:05 Rob: updated InitialiseSettings for bug 21174 and 21077
01:17 Tim: on streber: removed a corrupt torrus DB file so it could be rebuilt, torrus should be working now
00:57 Tim: killed frozen torrus cron jobs and ran "torrus compile --tree=Network --force"
00:51 Tim: maybe torrus collector is still broken, trying /etc/init.d/torrus-common force-reload
00:46 Tim: with mpm-prefork managed to debug it fairly easily. Moved away permanently locked DB file render_cache.db, torrus.wikimedia.org is now fixed
00:39 Fred: restarting pdns on ns1
00:38 Tim: switching streber to apache2-mpm-prefork, can't work out why it's not working
00:22 Tim: trying "apache2 -X" on streber
00:00 Tim: restarting apache on streber
January 11
23:38 domas: logging the fact that we had cache layer meltdown at some point in time during the day
22:30 domas: leaving bits.pmtpa on db19's varnish, in case of troubles - uncomment bits.pmtpa .2 record in /etc/powerdns/templates/wikimedia.org and run authdns-update
19:43 logmsgbot: fvassard synchronized php-1.5/wmf-config/mc.php 'Swapped memcached from srv125 to srv232'
19:06 Rob: new apaches srv255, srv257 deployed. Updated node groups and synced nagios
19:03 Rob: new apache server srv254 deployed
18:24 atglenn: copy backlog of image data from ms1 to ms7 (running in screen as root on both boxes)
14:43 mark: Rebooting fuchsia, locked up again
14:24 mark: Increased load on knsq16-22 by upping lvs weight from 10 to 15
14:31 Rob: decommissioned srv120, srv121 to make room for new search servers.
14:29 Rob: pulling srv34 from rack to decommission, need the space for new search servers.
14:09 logmsgbot: mark synchronized php-1.5/wmf-config/mc.php 'Replace memcached on just decommissioned host srv82'
13:57 Rob: srv82, srv83, srv84 decommissioned to make room for new search servers in rack
12:56 Tim: started squid-frontend on knsq8, knsq10, knsq11, knsq13, knsq14, knsq15, all crashed at roughly the same time as knsq9
12:52 Tim: started squid-frontend on knsq9, died at ~17:30 on the 3rd. Syslog shows many crashes, followed by "out of socket memory" a couple of hundred times, then silence
12:50 RoanKattouw: Started logmsgbot (running as catrope instead of nobody)
12:32 RoanKattouw: logmsgbot down since fenari reboot, needs to be restarted by a root
December 28
21:02 RoanKattouw: Restarting refreshLinks for enwikibooks on hume
20:49 mark: Created 10G of swap space on fenari
20:07 domas: powercycled fenari, console not responsive
15:00 mark: Installed Karmic on LVS1, setting up a test network for LVS performance testing
14:55 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 17338: Disable $wgRestrictDisplayTitle and enable subpages in the main namespace on rmwiki'
11:26 domas: db20 offlined disk array, after reboot booted into netinstall (but saw the array), after reset /SYS, and some operator's no-operation in BIOS and RAID controller setup screens, it booted up properly
December 26
19:59 DaBPunkt: (non-dev-entry) Many apaches in the US died, some db-server reported overusing. Problem fixed itself after some minutes.
15:15 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'Fix name of oldwikisource -> sourceswiki'
15:11 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'Enable ProofreadPage for oldwikisource, which strangely is not part of the wikisource group'
14:35 Rob: db27 back online with new fan controller board.
13:53 Rob: shutting down mysql on db27 for hardware replacement.
12:56 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 21517: Actually enable patrolling on frwikibooks and frwiktionary'
08:22 Tim: on isidore, also disabled the CentralNotice job that was running every 20 minutes from /etc/crontab
05:16 Tim: disabled CentralNotice rebuilds for donate.dev.wikimedia.org, was overloading isidore (which is only a single-core pentium)
December 22
23:45 logmsgbot: fvassard synchronized php-1.5/wmf-config/CommonSettings.php 'Added 2009_Notice49 to tracking.'
December 21
19:25 Rob: ms8 still borked.
19:24 Rob: restarted mysql on db27
19:12 Rob: removed the fan boards in db27 and cleaned connections (were filthy with toner or dust or something) and replaced them. system booting back online. (Will watch it for errors for the next week.)
18:36 Rob: shutting down db27 mysql manually for troubleshooting hardware on the system
18:04 Rob: db28 mainboard and fans replaced, booting back online.
17:35 Rob: restarted mysql on srv185, since it appears to be a ext. storage slave.
17:32 Rob: srv185 memory replaced, back online.
17:20 Rob: shutting down srv185 to swap bad dimm1
16:52 Rob: swapped out bad disk in db30, all leds are green now.
16:22 Rob: swapped out bad memory in sq32, booting it back up.
16:13 Rob: shutting down sq32 to swap out bad memory
16:04 RoanKattouw: Running namespaceDupes on brwiki for bug 21417
16:03 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 21417: Change name of project namespace on brwiki, add old name as alias'
15:52 mark: Killed memcached on browne
15:51 mark: Started ircd on browne
15:51 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 21041: Allow sysops to grant and bcrats to remove transwiki on nowiki'
15:46 Rob: browne back online after move.
15:46 Rob: nagios synced to node group files without will.
15:45 Rob: will decommissioned, pulled from rack
15:40 Rob: shutting down browne to move its rack location, will be back online shortly.
15:36 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 21071: Add patroller and autopatroller groups on hewiki'
15:26 mark: Moved udpmcast from browne to dobson
13:11 mark: Fixed Racktables rack thumb problem by installing php5-gd; it was just serving cached thumbs
11:06 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 21306: Allow sysops to add/remove patroller and autopatrol on hrwiki'
11:00 RoanKattouw: Running namespaceDueps on mlwiki for bug 21277
10:56 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 21277: Add Portal and Portal talk as aliases on mlwiki'
10:53 RoanKattouw: Running namespaceDupes.php on cawikibooks for bug 20980
16:36 hcatlin: new and improved mobile with compressed memcached + new homepags + better utf-8 handling
16:24 hcatlin: taking down mobile1 for a large software update
15:59 mark: Disabled NIS on zwinger
15:54 mark: Disabled NIS on ms1
15:53 mark: Disabled NIS on ms4
15:47 mark: Installed puppet on ms1 and ms7
15:27 Andrew: srv129, srv123, srv120, srv95 seem to be in swapdeath
15:25 logmsgbot: andrew ran sync-common-all
15:24 Andrew: sync-common-all caused memory spike on apaches again, site seems to still be up though
15:17 Andrew: Updating LiquidThreads to trunk state, using sync-common-all
12:58 RoanKattouw: Tim ran rebuildTemplates.php on hume for all languages
12:52 Andrew: Fixed morebots, was choking because the Server Admin Log was archived.
12:50 Andrew: Moved Morebots init script into a line in rc.local. Morebots seems to have been down due to Freenode's DDoS problem, maybe it isn't exiting properly when disconnected from the server
12:46 Tim: ran rebuildTemplates.php on hume for all languages