18:39 RoanKattouw: Running extensions/Vector/switchExperimentPrefs.php on enwiki to disable WikiLove for those with the "Exclude from experiments" preference
18:38 logmsgbot: catrope synchronized php-1.17/wmf-config/InitialiseSettings.php 'Enable WikiLove by default on enwiki'
18:17 RoanKattouw: Deploying updates for ArticleFeedback and WikiLove
18:09 RobH: srv169 still offline, do not repool (set to false in pybal on lvs3)
18:00 RobH: doing reboot testing on srv169 and puppet testing. server is depooled, disregard notices.
17:56 RobH: depooled srv169, its not updating its /var/local/apache via puppet for some reason.
17:53 logmsgbot: catrope synchronized php-1.17/wmf-config/CommonSettings.php 'Add code enabling WikiLove by default if $wmgWikiLoveDefault is true'
17:52 logmsgbot: catrope synchronized php-1.17/wmf-config/InitialiseSettings.php 'Add wmgWikiLoveDefault variable, set to false'
17:48 RobH: blog.wikimedia.org updated to wp3.1.4
16:48 RobH: srv169 reinstalled and returned to service. (lvs3 is not repooling, citing a key error that has since been removed, perhaps pybal nees to be refreshed?)
16:25 RobH: reinstalling srv169
10:57 mark: Changed preferred path for AS14907 (pmtpa/eqiad) - AS43821 (esams) from 14907 2828 6908 43821 to 14907 3257 1299 43821 (bidirectionally)
23:03 RobH: db1001-db1048 (minus a few hosts mentioned in rt#1105) are installed, puppetized, and ready to be placed into replication service.
22:53 logmsgbot: awjrichards synchronized php-1.17/wmf-config/CommonSettings.php 'Making CentralNotice config changes for test wiki (page path and CentralDBname)'
21:03 RobH: OS install running on db1033-db1048
20:53 RobH: db1033-db1048 raid setup, ready for OS installs
20:44 mark: Started rsync of ms5 commons thumb dir '1' to ms1004
20:23 RobH: working on db1033-db1048 (setting up hardware raid)
20:10 RobH: db1017-db1031 os installed, puppet updated, ready for replication setup (db1023 & db1032 need further troubleshooting)
19:54 RobH: os install on db1017-db1031 done, running puppet updates
19:45 mark: Started rsync of ms5 commons thumb dir '0' to ms1004
19:36 mark: Deployed ms1004 (eqiad) as caching thumbs server in a similar configuration as ms6.esams; requesting thumbs from ms5 and caching (and purging) them locally
19:35 RobH: db1023 and db1032 drac unresponsive, rt# 1111 abd 1112 to troubleshoot next onsite
19:23 RobH: rob is working installs on db1017-db1032
19:08 RobH: updated netboot.cfg for private1-b-eqiad broadcast
23:29 RobH: db1003 and db1004 os installed, puppet updated, ready to be put into replication service
22:46 RobH: db1001 installed and updated with puppet, ready to be put into replication service
22:45 RobH: db1002 installed and updated with puppet, ready to be put into replication service
22:44 RobH: db1001 is installed with OS and up (has been for awhile), db1002 and db1003 finishing installs
22:24 RobH: carbon host online as install tftp host in eqiad.
22:10 RobH: syncing docroot and gracefully restarting apaches to take the update
22:01 RobH: carbon rebooting
21:43 RobH: testing tftp function of carbon from cp1001 as its on the same public vlan network
21:13 RobH: updated puppet to point tftp hosts in eqiad to carbon
21:13 RobH: rsyncing over the tftpboot directory from brewster to carbon
21:00 logmsgbot: pdhanda synchronized php-1.17/wmf-config/InitialiseSettings.php 'Bug 28781. Change requested by jamesofur on irc'
20:51 logmsgbot: pdhanda synchronized php-1.17/wmf-config/InitialiseSettings.php 'Bug 28781. This time without breaking the world'
20:33 RobH: reinstalling carbon
20:27 RobH: updated misc::install-server and brewster/carbon entires in puppet, install server roles are now called on a service by service bais. brewster also needs reboot for kernel update, so doing that now as well.
17:47 RobH: carbon host powered down, the misc::install-server class in puppet needs to be broken out into more specialized roles, carbon reinstalled and updated to just host the tftp roles of the install server
17:36 RobH: new install host carbon os installed, running puppet updates
17:11 RobH: copied db41 sudoers file to db40 since it was overwritten by old raid utils package. now maybe the log spam will stop
15:20 RoanKattouw: NTP has synced clock in prototype, it's got the correct time now
15:18 RoanKattouw: Installed ntp on prototype, its clock was 10 minutes behind. It seems to be slowly catching up now
14:58 logmsgbot: midom synchronized php-1.17/wmf-config/InitialiseSettings.php 'disabling while my eyes are on other stuff'
14:52 RobH: stat1 racked and remote accessible. has public vlan setup, needs IP allocated and os installed. (also needs to be added to racktables)
13:59 mark: Partly implemented European routing policy on AS14907 as well, and tuned a bit for routing southern routes via Tampa
20:00 andrewS: pmtpa: we should have power and the overhead railings to run data cables by late monday, as per Matt.
19:20 Ryan_Lane: depooled srv301
12:37 Tim-away: finished fixing ExtensionDistributor, re-enabled cron job and xinetd
07:37 Tim: XO appears to be back up, after ~3 hours of downtime
07:30 Tim: fixing ExtensionDistributor snapshot permissions, was owned by mark instead of extdist
06:21 logmsgbot: jeluf synchronized php-1.17/wmf-config/InitialiseSettings.php '29168 - Define namespace aliases for Author, Page, Index on all Wikisources'
06:10 Tim: configured csw5-pmtpa to send its syslogs to 10.0.5.8 instead of 10.0.0.1
00:19 logmsgbot: tstarling synchronized php-1.17/includes/db/DatabaseMysql.php 'reapplied r90644, it was fixing an actual problem'
16:47 logmsgbot: tstarling synchronized php-1.17/wmf-config/db.php 'attempting to set max threads on testwiki only'
16:32 RoanKattouw: Site came back up instantly after Tim disabled max_threads
16:28 logmsgbot: tstarling synchronized php-1.17/wmf-config/db.php 'disabled max threads'
16:28 logmsgbot: catrope synchronized php-1.17/includes/db/LoadBalancer.php 'Make the "LB failure with no last connection" DB error log message more verbose'
16:24 binasher: upping max threads in db.php
16:24 logmsgbot: root synchronized php-1.17/wmf-config/db.php 'upping max threads'
16:17 logmsgbot: laner synchronized php-1.17/includes/db/DatabaseMysql.php 'Roll back Tim's getLag() change from 03:18 today (r90644)'
15:58 Ryan_Lane: restarting pybal on lvs3
15:52 logmsgbot: catrope synchronized php-1.17/includes/db/Database.php 'Roll back Tim's getLag() change from 03:18 today (r90644)'
15:50 RoanKattouw: Going to roll this back: 03:18 logmsgbot: tstarling synchronized php-1.17/includes/db/DatabaseMysql.php 'Database::getLag() fix from r90644'
15:46 RoanKattouw: Lots of "LB failure with no last connection" errors in dberror.log, suddenly started at 10:53 UTC
15:43 RoanKattouw: Torrus broken, reporting lots of NaNs
15:40 RoanKattouw: Strike that, site not completely down. Intermittent service reported in US/Canada
15:39 RoanKattouw: AMS squids all down in Nagios. Load is approximately halved on app servers. TPA squids look like they should be fine, but the site is reported to be down worldwide
15:37 RoanKattouw: Mark not near a computer, texting Ryan instead
15:36 RoanKattouw: All wikis seem to be down worldwide. Have texted mark
09:14 logmsgbot: jeluf synchronized php-1.17/wmf-config/InitialiseSettings.php '29380 - Change logo of Kannada wiktionary (kn.wiktionary.org)'
08:48 logmsgbot: jeluf synchronized php-1.17/wmf-config/InitialiseSettings.php '29515 - install Narayam in all Sanskrit wiki projects'
08:42 logmsgbot: jeluf synchronized php-1.17/wmf-config/InitialiseSettings.php '29474 - Search Index and Author namespaces by default on all Wikisources which don't have a different configuration'
08:27 logmsgbot: jeluf synchronized php-1.17/wmf-config/InitialiseSettings.php '29172 - Add Page, Author and Index namespaces to $wgContentNamespaces on all Wikisources'
07:53 JeLuF: fixed broken ganglia installation on project2 (reported by puppetmon)
07:51 JeLuF: added new class mobile::disabled in puppet that has apache disabled, assigned mobile1 to it.
19:25 logmsgbot: kate synchronized php-1.17/wmf-config/db.php 'taking db18 out of rotation to dump s7 for TS'
19:26 river: taking db18 out of rotation to dump s7 for TS
18:56 Ryan_Lane: moving preilly from restricted to mortals, for deploy access
15:01 apergos: deleted the old ssh key for snap2.wm.o from the puppet db also, as fenari was whining about it similarly, hope that gets is. spence puppet run *finally* done, sheesh
13:17 apergos: after changing snapshot2 from external to internal subnet (oops forgot to log that huh), puppet on spence was complaining about the external resource Nagios_host[snapshot2] not being able to override the local resource, so in puppet db on db9 I deleted the old snapshot2 nagios resources, leaving only the new ones, puppet run is in process now on spence and is bleeping sloooow.
09:14 apergos: taking down snapshot2 in prep for os upgrade (reinstall)
01:52 logmsgbot: demon synchronized php-1.17/wmf-config/InitialiseSettings.php 'Fix checkuserwiki permissions back to how they were'
18:51 logmsgbot: root synchronized php-1.17/wmf-config/db.php 'drop db29 weight to 0'
18:21 logmsgbot: root synchronized php-1.17/wmf-config/db.php 'adding db43 to s6 with a weight of 20'
18:21 binasher: increased db43 weight to 1500
18:07 binasher: just put db43 into rotation in the s6 cluster
18:04 logmsgbot: root synchronized php-1.17/wmf-config/db.php 'adding db43 to s6 with a weight of 20'
16:53 pdhanda: Upgrading wikitech.wikimedia.org to 1.17
15:43 apergos: started en pedia dumps run on snapshot 4, from root screen session as backup user. This will run as 32 jobs in parallel; we should keep a half an eye on the db in about 16-17 hours (previous runs had only 15 jobs going at once).
14:18 mark: Configured TiNet transit on cr2-eqiad (BGP sessions deactivated)
14:18 mark: Configured border ACL filtering on the eqiad border
23:03 Ryan_Lane: ran sync-docroot - how does this not get logged?
19:16 RobH: snapshot4 properly boots to serial console display on bootloader and os
18:31 RobH: snapshot4 rebooting to troubleshoot console redirection
13:27 apergos: snapshot4 installed and *almost* correct (stray apache started up but the rest looks ok)
June 12
19:47 JeLuF: Software RAID on fenari was broken, mirror sdb1 was failed. Re-added sdb1. If the error is persistent, sdb needs to be replaced.
19:19 JeLuF: cron.{hourly,daily,...} on hume was not running due to a broken record in /etc/crontab, so logrotate was not running
12:58 logmsgbot: ariel synchronized php-1.17/includes/HistoryBlob.php 'check for existence of mhash function, not extension (need for lucid php build) (take 2, after svn up :-P)'
12:51 logmsgbot: ariel synchronized php-1.17/includes/HistoryBlob.php 'check for existence of mhash function, not extension (need for lucid php build)'
June 11
07:37 logmsgbot: catrope synchronized php-1.17/wmf-config/CommonSettings.php 'Move section disabling minordefault out of the if($wmgUseUsabilityInitiative) conditional and document it'
07:12 RoanKattouw: Running script to fix preferences of users that can't turn off minordefault on enwiki. See bug 24313 comment 55. Script is in maintenance/fixMinorPrefs.php
June 10
21:26 binasher: mobile2 goes natty
18:08 Ryan_Lane: added ip and dns entry for internproxy.wikimedia.org in the sandbox vlan
13:58 mark: Setup OSPF and OSPFv3 on the first 10G wave circuit between csw1-sdtpa and cr1-eqiad
13:39 RobH: updaed puppet giving andrew shields basic restricted shell access to fenari
11:08 mark: Setup quick log rotation of ipv6and4 on maerlant
04:51 logmsgbot: demon synchronized php-1.17/wmf-config/CommonSettings.php 'Let normal users test/signoff on revisions, just like we let them post comments. Restricting it to "coders" kind of defeats the purpose'
June 6
20:22 logmsgbot: hashar: ci.tesla: cleaned up backlog and /tmp (I really need to fix this)
17:00 logmsgbot: hashar: ran namespaceDupes.php frwikiversity --fix (bug 29015)
14:39 Reedy: Poking at CentralAuth (bug 28767) on testwiki. If things get more broken, it's probably me
09:32 apergos: there's a hung tcpdump on amssq37 tht won't die. it doesn't seem to be doing much but it won't die either (that was me looking at purge issues)
21:13 logmsgbot: catrope synchronized php-1.17/load.php 'Revert live hack'
21:04 logmsgbot: catrope synchronized php-1.17/load.php 'Add live hack for profiling'
20:48 logmsgbot: pdhanda synchronized php-1.17/wmf-config/InitialiseSettings.php 'Bug 29129. Added the new namespace to be search by default for metawiki.'
12:10 Andrew: Installing SecurePoll DB tables on wikis in ~/centralauth_wikis_missing_securepoll
07:57 apergos: the images Rl2-quality-full-003.jpg and Picture_602.11.85.5.jpg have been removed from our image server permanently, office action (legal request).
06:02 apergos: attempting to reboot ekrem (was not pingable)
May 26
23:20 notpeter: powercycling knsq5
21:34 Ryan_Lane: adding en.prototype, de.prototype and test.prototype cnames for prototype.wikimedia.org
18:58 logmsgbot: catrope synchronized php-1.17/wmf-config/CommonSettings.php 'Raise AFT tracking percentage from 2% to 10% (restores pre-ramp-up volume) and put all users in the show-expertise bucket. Also bump tracking version to 7'
18:54 Ryan_Lane: adding wikimedialbsecure service to amslvs1, for testing https
18:47 Ryan_Lane: adding wikimedia-lb.esams address to ams and kn text squids, also adding lvs configuration for this address
18:31 Ryan_Lane: removing mediawiki-lb.esams and foundation-lb.esams reverse addresses, they were assigned to the same ips as text.esams and bits.esams respectively
17:48 Ryan_Lane: adding forward entry for wikimedia-lb.pmtpa.wikimedia.org
15:34 mark: Moved AS13030 connection configuration from port 2/11 to port 2/11 on br1-knams
May 25
22:39 awjr: svn up'd on civicrm.wikimedia.org to r189 - deploying further optimizations to contribution searching
21:22 Ryan_Lane: adding wikimedia-lb svc address to squids
21:22 Ryan_Lane: adding wikimedialbsecure service to lvs2, for testing https access to wikimedia.org sites.
21:21 Ryan_Lane: modified pybal configuration to add wikimedia-lb ip to text backend
14:45 logmsgbot: demon synchronized php-1.17/wmf-config/InitialiseSettings.php 'bug 29064: close wikimania2009wiki'
14:43 logmsgbot: demon synchronized closed.dblist 'bug 29064: close wikimania2009wiki'
12:26 Tim: marked the deletion jobs in OTRS as "invalid" to avoid confusing error messages
12:25 Tim: re-enabled anti-deletion security patch in OTRS, disabled "temporarily" by Fred on June 4, 2009
11:58 domas: rebooting db9, looks unhealthy
11:06 apergos: so that was me on srv225: stopped apache, waited for puppet restart: it now syncs, then does the restart. oh btw srv225 isn't in the pybal apache conf file, weirdly... anyone know why?
09:15 apergos: restarted front and backend squids on sq75
08:47 apergos: restarted front and back end squids on sq81, front end was 20gb and back end was out to lunch
08:35 apergos: ekrem's root partition was full; cleared out some older logs from /var/log/apache2, compressed the most recently rotated (it had failed from disk full), reloaded puppet
07:20 Tim: set read_only=1 on db10, it's a slave so it should be read only. Nagios says it was broken for 6 months.
01:05 Tim: created an account for myself in watchmouse
00:48 Andrew: Running populateBv2011EditCount.php in a screen on hume
00:32 Ryan_Lane: pruning binlogs on db9
May 24
23:12 logmsgbot: pdhanda synchronized php-1.17/wmf-config/InitialiseSettings.php 'Bug 29129. Added a namespace alias for metawiki.'
20:36 notpeter_: resyncing drbd nfs1 to nfs2
20:28 RobH: mobile4 awaiting final configurations for service, mobile5 installation done, not yet run puppet updates
20:20 RobH: mobile4 is running puppet updates, mobile5 is mid-install
19:46 RobH: mobile4 & mobile5 racked and accessible for mgmt. installing OS on mobile4 now
18:00 Ryan_Lane: added new reverse zone for svc ips in esams; added entries for bits.svc.esams.wmnet and upload.svc.esams.wmnet
17:31 Ryan_Lane: removing bits-lb and upload-lb public ip enties, and adding bits.svc.pmtpa.wmnet and upload.svc.pmtpa.wmnet entries for https testing
16:46 mark: Upgraded cr2-eqiad to JUNOS 10.4R4.5
16:44 mark: PIM configuration was missing on some virtual interfaces on csw1-sdtpa, which broke purging to esams. Put back in place
16:09 mark: Reloading re1.cr2-eqiad
16:01 mark: Fixed tunnel hack between pmtpa and eqiad
15:01 RobH: db20 networking restarted, back online
14:58 RobH: Miguel is rebooting db18
13:55 mark: reloading csw5
13:52 domas: ns2 has been down for 3 hours, will try restarting pdns
13:47 mark: reloading asw-a4
13:45 mark: reloading asw-d3
13:39 domas: db18/20/28 are still down, may need power cycle. Took them out of rotation.
13:37 RobH: restarted networking on db33-39.
13:30 mark: restarted networking on db30-32.
13:25 domas: DB server issue appears to be a linux kernel issue, forcedeth-related, can be fixed with /etc/init.d/networking restart. Doing so on db10-30, asking mark to do the rest.
13:23 mark: restarted slot 3 since that's where the problematic database servers are
13:12 mark: csw1-sdtpa back up, but a whole lot of database servers are unreachable
13:07 mark: upgraded and reloaded csw1-sdtpa
12:05 mark: Upgraded bootrom and software on asw-d1-sdtpa and asw-a5-sdtpa
10:53 mark: Copied existing 2.7.0.2 image to secondary flash, and installed 2.7.0.3a image on primary flash on csw5-pmtpa
10:42 mark: Copied existing 2.6.0 image to secondary flash, and installed 2.7.0.3a image on primary flash on csw1-sdtpa
18:45 apergos: reloading puppet on all hosts, with a minute sleep in between, running in screen on fenari as root. needed for enabling pluginsync on clients
18:20 Ryan_Lane: moved ipv6 proxy from iris to maerlant
17:35 RobH: db20 back online, needs mysql setup for replication on another cluster. created rt#836
17:29 RobH: db20 was offline from earlier hardware issues that appear to have been resolved. it is in no current deployment group. booting it online to see if it requires reinstallation
21:23 awjr: running contribution auditing scripts on civicrm.wikimedia.org
21:21 awjr: enabling log audit module on civicrm.wikimedia.org
21:18 awjr: svn up'ing to r183 of wikimedia branch on production instance of civicrm on grosley
20:57 logmsgbot: pdhanda ran sync-common-all 'Bug 28717. Group changes and flagged revs enabled for bnwiki'
20:50 Ryan_Lane: maerlant too
20:50 Ryan_Lane: rebooting maerlent
20:13 logmsgbot: pdhanda synchronized php-1.17/wmf-config/InitialiseSettings.php 'Bug 28773. more rights and namespaces for stewardwiki'
17:16 RobH: updated dns for new mobile servers mgmt
17:12 RoanKattouw: Applying index changes on click_tracking_user_properties from r88462 to the cluster. Using /home/catrope/patch-click_tracking_user_properties-index.sql
16:06 ^demon: didn't work, had errors with ArticleAssessmentPilot, FlaggedRevs, LQT_alpha, and MessagesTp. Plus lots of rsync permission errors. Saved output to /home/demon/logs/l10nupdatefail.log
09:52 Tim: creating an account on wikitech for priyanka
09:51 logmsgbot: catrope synchronized php-1.17/wmf-config/CommonSettings.php 'Re-enable UploadWizard on Commons, adding note about how to switch it back off if it explodes'
09:39 logmsgbot: catrope synchronized php-1.17/extensions/UploadWizard/SpecialUploadWizard.php 'Fix fatal in live hack'
09:17 logmsgbot: catrope synchronized php-1.17/wmf-config/CommonSettings.php 'Re-enabling UploadWizard on test to hopefully reproduce ResourceLoader issue'
05:21 mark: Reduced passenger pool size to 200 on all mobile servers
04:46 mark: Depooled mobile2 and rebalanced LVS weights on m.wikipedia
May 16
21:56 logmsgbot: hashar synchronized php-1.17/wmf-config/InitialiseSettings.php '28504 - Need to remove all tokipona settings from InitialiseSettings.php'
21:52 logmsgbot: hashar: cleaning up settings files for tokipona stuff (bug 28504)
09:37 logmsgbot: ariel synchronized php-1.17/wmf-config/CommonSettings.php 'raise account throttle for el wiki for workshop (maybe I can talk the community into upping the threshhold generally??)'
10:35 logmsgbot: hashar: had to manually delete a row from the ptwiki.langlink table (ll_from: 162503, ll_lang: zh-classic, ll_title: Template:Link A (bug 28805)
10:27 logmsgbot: hashar: running update ignore query on ptwiki for bug28805
10:23 logmsgbot: hashar: batch for bug 28805 finished.
10:17 logmsgbot: hashar: running batch queries on all wikis for bug 28805
10:31 Tim: deployed a patch to gather i18n usage stats for siebrand
May 11
22:09 mark: Reloaded csw5-pmtpa on existing code base
01:16 logmsgbot: demon synchronized php-1.17/wmf-config/CommonSettings.php 'Temporarily disabling UploadWizard per Neil K while a bug is being worked out.'
21:02 Ryan_Lane: powercycling srv281 since it's dead
21:01 Ryan_Lane: powercycling srv247 since it's dead
20:56 Ryan_Lane: mobile has stablized, moving weights back to normal
20:50 Ryan_Lane: lowering weight on mobile1 and mobile2
20:45 Ryan_Lane: restarted apache on mobile2.
19:16 mark: Moved all inter-subnet routing from csw5-pmtpa to csw1-pmtpa
19:16 mark: Reconfigured pybal on lvs3 to BGP connect to csw1-sdtpa instead of csw5-pmtpa, and restarted it
17:53 mark: Moved vlan 2 routing from csw5-pmtpa:ve2 to csw1-pmtpa:ve2
16:27 apergos: or at least srv254.
16:25 apergos: nagios is whining about some services that are actually in operation (eg srv250)
16:07 Ryan_Lane: or not, mgmt interface isn't letting me in
16:06 Ryan_Lane: powercycling srv250
13:56 Tim: removed myself from SMS notification in nagios, since the configuration is so broken that it sends me a text every 10 minutes day or night
13:49 RobH: nagios sms pages seem to be delayed from the actual service outage by a long, long delay
13:10 Tim: did the above in both puppet and nagios directly since puppet seems to be still broken
13:07 Tim: increased the number of retries of payments.wikimedia.org before sending an SMS from 1 to 20
02:02 notpeter: accidentally deleted sockpuppet:/var/lib/puppet/ssl am regnerating keys and will begin signing soon
May 9
23:36 notpeter: so annoying. doubling puppet freshness check timing to 4 hours.
22:01 logmsgbot: awjrichards synchronized php-1.17/wmf-config/InitialiseSettings.php 'Disabling dashboard on enwiki pending fixes to data collection script'
18:01 RoanKattouw: Running schema change (adding index of aa_timestamp field on enwiki.article_feedback table) on s1 by directly running it on the master. Table size is ~275k rows
17:56 logmsgbot: catrope synchronized php-1.17/wmf-config/InitialiseSettings.php 'Set $wmgArticleFeedbackLotteryOdds to 0.005 on enwiki. This enables AFT on 0.5% of all enwiki main namespace pages'
13:26 RoanKattouw: Deploying r87712, which should fix the ResourceLoader breakage in IE
13:12 apergos: restarted squid back-end on amssq37 (hogging memory)
08:06 apergos: rebooting srv198 for testing
07:52 apergos: rebooting srv281 for testing
May 8
22:46 RobH: fixed deep linking issue for uploads from old techblog. merge did not copy, had to manually pull from singer and also add additional rewrite rule to apache via puppet
22:40 RobH: copied over techblog upload data, which upon further review was not copied by the merge plugin as expected
18:29 RobH: updated categories on blog to remove testblog url entry
18:20 RobH: updating dns to revert bad change on blog propogation, split wikimediafoundation template files to themselves, were soft linked to wiktionary.org
18:05 RobH: blog redirects are in place for projects, not sure why or what added them, as they do not belong. fixed hooper via puppet to handle blog.wikimediafoundation.org
15:46 RobH: added exim simple mail sender to puppet entry for hooper
May 7
20:31 RobH: hooper running back on puppet, with all config files being updated normally
18:11 RobH: tinkering with redirects on hooper. disabled puppet daemon while doing local apache config changes, once tested will push up to puppet and reenable it on hooper
18:08 RobH: blog dns updated enough for both myself and guillom to test basic blog function. all basic use seems fine. still testing more advanced use cases
16:58 RobH: updated dns for blog and techblog. may take an hour or so to propogate, cannot update new blog until it resolves for authors to it. (old blogs will slowly cycle out of dns)
16:40 RobH: blog migration moving along, all data migrated, basic rewrites and such are in place, final review and cleanup in progress before changing testblog.w.o to blog.w.o in database and updating DNS
16:03 RobH: testblog.w.o currently now running 'master' database for the wmf blog. online as testblog, can do quick db replace when its ready to go online
15:57 RobH: starting the copy of the main blog, any changes to blog.w.o after this time will be lost in the migration to the new host
15:53 RobH: updating blog software on hooper, installing all extensions before connecting to database copy of newsblog (not yet copied)
15:37 RobH: wiping out testblog on hooper and getting basic software installed for new blog deployment. database not yet copied, old blogs still 'primary'
12:55 logmsgbot: ariel synchronized php-1.17/wmf-config/CommonSettings.php 'raise account throttle for el wiki for workshop'
09:20 rainman-sr: deleted some old logs on search11 to free up space
May 6
22:46 logmsgbot: pdhanda synchronized php-1.17/wmf-config/InitialiseSettings.php 'Bug 28773. Modified wgAddGroups and enabled subpages for stewardwiki'
20:39 Ryan_Lane: added apaches::service to imagescalers class in puppet to ensure that sync-common will run before apache is started on the imagescalers
18:10 awjr: enabling drupal/civicrm logging on civicrm.wikimedia.org (forgot to turn this back on post-upgrade)
17:41 notpeter: killing the shit out of puppet to clear its cache.
17:38 logmsgbot: pdhanda synchronized php-1.17/wmf-config/InitialiseSettings.php 'Bug 28791. Fixed logo for piwiki'
21:33 awjr: performing CiviCRM upgrade (3.1.6->3.4.0) on grosley
21:24 awjr: upgrade to drupal 6.20 complete for civicrm.wikimedia.org
21:10 awjr: replacing /srv/org.wikimedia.civicrm dir on grosley with svn co from new deployment branch in wikimedia repo (/branches/deployment/fundraising-civicrm/d620c34)
20:51 awjr: disabling civicrm and drupal-related cron/hudson jobs
19:43 logmsgbot: catrope synchronized php-1.17/wmf-config/CommonSettings.php 'Fix config bug that caused tracking to fail for AFT, and bump tracking version'
19:18 mark: powercycled ersch
15:40 RobH: typo asw-b8
15:39 RobH: swapping msw1-eqiad fan tray with asw-a8-eqiad fan tray
05:58 logmsgbot: midom synchronized php-1.17/wmf-config/db.php 'adding srv170 after hw reset'
23:46 Tim: doing hard reset of alsted, it's not sending ssh version strings and not allowing logins by serial
22:17 logmsgbot: pdhanda synchronized php-1.17/wmf-config/InitialiseSettings.php 'Disabled option to allow blocked users to edit their talk pages on itwiki. See Bug 9073'
21:02 Ryan_Lane: created Ambassador-announce-l list
20:52 richcole1: replacing memory in srv281
20:49 RobH: srv281 shutdown for memory replacement
20:41 logmsgbot: pdhanda synchronized php-1.17/wmf-config/InitialiseSettings.php 'Enable by default, option to enable blocked users to edit their own talk page'
20:34 richcole1: replacing bad drive in srv284
20:08 Ryan_Lane: powercycling srv281
20:03 RobH: ms4 is powered down, was already offline and unresponsive to console.
17:47 logmsgbot: awjrichards synchronized php-1.17/wmf-config/CommonSettings.php 'Preparing conf for ArticleFeedback with way less click tracking to prevent cluster breakage'
17:40 RobH: etherpad.proxy to 000-etherpad.proxy on hooper apache2 enabled, should fix
17:34 RobH: hooper vhost for testblog back online
17:17 RobH: rolled back testblog vhost from hooper
17:01 RobH: testblog enabled for initial setup on hooper without disrupting etherpad service (on either of its url redirects)
16:56 RobH: updating etherpad.proxy apache config to include eiximenis serveralias and enabling the testblog vhost
16:48 Reedy: Manually pinged CR update script, had got stuck - 2 revisions. Let me know if it does it again
20:38 logmsgbot: pdhanda synchronized php-1.17/wmf-config/InitialiseSettings.php 'enabled CustomUserSignup extension for testwiki'
20:37 logmsgbot: pdhanda synchronized php-1.17/wmf-config/CommonSettings.php 'enabled CustomUserSignup extension for testwiki'
20:32 awjr: disabled wmf_premiums module on civicrm.wikimedia.org - it is currently not being used and potentially causes queue consumption to choke with certain contribution messages
19:51 logmsgbot: catrope ran sync-common-all
19:50 RoanKattouw: Running sync-common-all to deploy UploadWizard changse
17:52 pdhanda: Running maintenance/populateParentId.php on all wikis
08:21 Andrew: sync-common-all worked. scap still broken
08:21 logmsgbot: andrew ran sync-common-all
08:21 Andrew: trying sync-common-all
08:19 Andrew: syncs are broken, log littered with XXX: [sudo] password for andrew:
08:12 Andrew: re-scapping, typo in extension-list
08:12 logmsgbot: andrew synchronizing Wikimedia installation... Revision: 86895:
08:11 Andrew: Scapping to enable DisableAccount extension
08:11 logmsgbot: andrew synchronizing Wikimedia installation... Revision: 86895:
08:02 logmsgbot: andrew synchronizing Wikimedia installation... Revision: 86895:
08:02 Andrew: running scap to deploy the code itself
08:01 Andrew: deploying DisableAccount extension to checkuserwiki, stewardwiki, arbcom_enwiki since the special page was removed without consulting Philippe
02:15 logmsgbot: robh synchronized php-1.17/wmf-config/InitialiseSettings.php 'adding settings for checkuser and steward wikis'
April 25
23:33 Ryan_Lane: added python-mwclient to lucid repo
21:36 RobH: storage2 still offline, wont boot into os, but is remotely accessible
21:20 RobH: trying to fix storage2
20:16 notpeter: actually adding everyone on ops to watchmouse service... didn't know this had not already been done.
20:02 RobH: updated csw1 to removed labels and move to default vlan ports 11/12, 11/14, 11/19, & 11/21. old connection ports for dataset2, tridge, ms1, and ms5
19:53 RobH: the datacenter is looking awesome.
19:45 RobH: ms1 moved from temp network to permanent home, no downtime, responding fine
19:42 RobH: ms5 connection moved, no downtime, responds fine, less than 4 seconds
19:40 RobH: updated csw1-sdtpa 15/1,15/2 from vlan 105 to vlan 2, 15/3 and 15/4 from vlan 105 to 101
18:52 RobH: snapshot4 relocated to new home, ready for os install
18:42 RobH: db19 and db20 back online (not in services as they have other issues)
18:39 RobH: db19 and db20 powering back up
18:25 RobH: virt4 experienced an accidental reboot when rebalancing power in the rack, my fault, not the hardware
18:12 RobH: rack b2 power rebalanced
18:01 RobH: db19 set to slave, depooled in db.php, no other services evident, shutting down (mysql stopped cleanly)
18:00 RobH: db20 shutdown
18:00 RobH: didnt log that i setup ports 11/38-40 for db19, db20, and snapshot4 on csw1-sdtpa. tested out fine and all my major configuration changes on netowrk should be complete
17:56 RobH: ok, db20 and db19 are coming offline to relocate their rack location due to power distro issues
15:47 RobH: delay, not coming down yet, need more cables
15:46 RobH: db19 is coming down as well, it is depooled anyhow
15:46 RobH: db20 is coming down, ganglia aggregation for those hosts may be delayed until it is back online.
15:21 RobH: relocating snapshot4 into rack c2, it will be offline during this process
15:20 RobH: db43-db47 network setup, sites not down, yay me
15:10 RobH: being on csw1 makes robh nervous.
15:09 RobH: labeling and setting up ports on 11/33 through 11/37 on csw1-sdtpa for db43 through db47
14:47 RobH: fixed storage2 serial console (set it to higher rate, magically works, or it just fears me) and also confirmed its remote power control is functioning
14:42 RobH: stealing dataset1's known good scs connection to test storage2. dataset1 service will remain unaffected.
April 24
21:30 Ryan_Lane: restarting apache on mobile1
15:35 RobH: swapping bad disk in db30, hotswap, should be fine
14:36 RobH: swapping out the management switch in c1-sdtpa. msw-c1-sdtpa will be offline, so the mgmt interfaces of servers in that rack will be offline. all normal services will remain unaffected.
April 23
22:31 RobH: required even.
22:31 RobH: no drives display error leds, futher investigation requried
22:27 RobH: ms2 is having bad drive investigated. if we do this right, it wont go down. if we don't it will. is a slave es server.
22:00 RobH: singer returned to operation, blog, techblog, survey, and secure returned to normal operation
21:52 RobH: singer is once again coming back down for drive replacement. This will take offline blog.wikimedia.org, techblog.wikimedia.org, survey.wikimedia.org, and secure.wikipedia.org. Service will be returned as soon as possible.
21:19 RobH: singer back online, for awhile, will come back down for further repair shortly.
21:05 RobH: singer going down, blogs will be offline, so will secure, system will return to service as soon as possible
21:00 RobH: preparing to fix the dead drive in singer, this will offline secure, blog, techblog, and survey during the drive replacement process
19:49 mark: Upgrading mr1-pmtpa to junos 10.4R3.4
17:49 RobH: migrating searchidx1 & search1-search10 to new ports in same rack. moving one at a time and ensuring link lights between moves. (already tested with search10)
14:11 RobH: db19 is back online, seems to not have any mysql setup done.
14:02 RobH: restarting db19
14:02 RobH: arcconf checks out all drives on db19 are indeed working as rich found earlier
12:47 mark: Added (x121Address=1) condition to the LDAP query of the ldap_aliases router on mchenry's exim
00:32 hcatlin: Mobile: Deploying fix to an issue that kept the standard-style Main_Page from displaying on mobile
00:25 Ryan_Lane: restarting memcached on all of the mobile servers
00:23 Ryan_Lane: repooling mobile3, since mobile will die without it (fun!!)
00:17 Ryan_Lane: depooling mobile3
00:13 Ryan_Lane: restarting apache on mobile3
00:10 Ryan_Lane: puppet was broken on mobile1, reinstalled it
April 22
23:56 domas: detached gdb from srv193 apache, apparently it was used for something
23:14 notpeter: restarting nagios (again)wq
22:43 notpeter: restarting nagios
19:23 apergos: shot all stopped rsyncs on ms5 (that were copying from ms4 about two weeks ago), changed all perms on the directories they had reached so thumbs can be served/read from them.. oh. not me, someone else must have done it, I'm not here :-P
19:02 RobH: ms4 shutting down for memory troubleshooting
19:02 domas: bumped up maxclients/serverlimit on singer to 350 (up from 150), set maxrequestsperchild to 30 to avoid heap blowup (down from 0), all governed via apache2/conf.d/maxrequests
18:25 Ryan_Lane: restarting apache on singer
18:19 Ryan_Lane: applying system patches to raskin
15:33 RobH: lowered maxclients more on singer, its going to slow down secure, but hopefully keep it online longer.
15:04 Ryan_Lane: downgraded sq71 to squid-2.7.7
14:51 mark: Pulled squid 2.7.9 packages from the lucid-wikimedia reprepro APT repository, reinstated 2.7.7
14:34 mark: Downgrading squid on sq66
14:15 mark: Stopping squid on sq72
14:14 RobH: reduced singer apache maxclients from 400 to 200, hopefully will reduce singer crashes
14:09 RobH: singer died, rebooting.
14:07 RobH: secure, blogs on singer, bouncing due to singer cpu max
13:53 RobH: singer threw invalid cert warning, cpu spike was occuring, restarting apache and load is normalizing
11:34 logmsgbot: catrope ran sync-common-all
11:33 RoanKattouw: Running sync-common-all to deploy r86464
11:11 logmsgbot: catrope synchronized php-1.17/includes/Exif.php 'Fix for bug 28615 (fatal on image description page)'
00:41 awjr: deployment of r86441 to payments servers complete
00:37 awjr: deploying changes to send full donor address information to PayflowPro (r86441) to all payments servers
00:31 awjr: re-deploying changes to send full donor address information to PayflowPro on payments2.wikimedia.org (suspected cause of previous issue: local browser cacheing)
22:57 awjr: reverting to r84367 of deployment branch on payments2.wikimedia.org due to bad address information appearing in civicrm
22:53 awjr: deploying changes to send full donor address information to PayflowPro payments2.wikimedia.org
22:20 Ryan_Lane: virt1-4 installed
20:54 Ryan_Lane: powercycling singer
20:53 Ryan_Lane: restarting apache on spence
20:52 mark: powercycled singer
20:35 Ryan_Lane: powercycling srv196
17:31 Ryan_Lane: finished ams and kn text squids
17:17 RobH: change live, etherpad still online and functional
17:16 RobH: pushing change to hooper so it doesnt kill etherpad when other apache vhosts exist
17:16 RobH: updated puppet entry for etherpad vhost
16:54 Ryan_Lane: upgrading squid on all squid systems, slowly, over the course of the day
15:09 mark: Killed all etherpad-user processes and started etherpad
15:09 RoanKattouw: Removing data for the section edit link experiment from the clicktracking table. Data is backed up in /home/catrope/sel . This will delete 8.4M of the 9.6M rows in the clicktracking table
15:05 RoanKattouw: Etherpad is broken, serving 500s. Have e-mailed Peter
08:43 Tim: running /home/wikipedia/bin/l10nupdate to get r86294
April 17
20:27 mark: Setup dirty "pseudowire" between pmtpa and eqiad; a GRETAP tunnel inside a v4-in-v6 tunnel bridged to eth1/eth0.666 on 2 linux boxes (sq71 on pmtpa side). Now running OSPF on it to get full v4 interconnectivity
April 16
08:33 apergos: disabled snaps on searchidx on in rainman's crontab, moving some en.wiki.* files in /a/search/indexes/snapshot off to fenari:/home/ariel/searchidx/....
April 15
18:41 mark: Rsynced private thumbs dir from ms4 to ms5
15:45 mark: Upgraded re0.cr1-eqiad to JUNOS 10.4R3.4
15:27 mark: Switching over routing engines from re0 to re1 on cr1-eqiad
15:23 mark: Upgraded re1.cr1-eqiad to JUNOS 10.4R3.4
14:44 mark: Upgraded re0.cr2-eqiad to JUNOS 10.4R3.4
14:34 mark: Upgraded re1.cr2-eqiad to JUNOS 10.4R3.4
09:57 apergos: moving some old (from 2009) dirs from /a/search/indexes/import on searchidx1 to /home/ariel/searchidx1/... on fenari, if I'm told we can toss em outright I'll do that later.
06:01 apergos: cleared out space on searchidx1 again. we only have 4.5gb free and it's still at 100%. out of tricks...
00:54 notpeter: py making some changes to nagios. subscribing the service to lots of files. also, lots more files being pushed out by puppet.
April 14
23:41 Ryan_Lane: adding reverse mappings for virt1-4
23:28 Ryan_Lane: adding dns entries for virt1-4
22:28 logmsgbot: catrope synchronized php-1.17/wmf-config/InitialiseSettings.php 'Re-enable ArticleFeedback on enwiki, upgrade is complete'
22:09 RoanKattouw: Running populateAFRevisions.php on enwiki
22:06 RoanKattouw: Creating article_feedback_revisions table on enwiki
22:04 RoanKattouw: Clearing message blobs
22:03 logmsgbot: catrope ran sync-common-all
22:03 RoanKattouw: Deploying ArticleFeedback changes to the cluster with sync-common-all
22:02 logmsgbot: catrope synchronized php-1.17/wmf-config/InitialiseSettings.php 'Disable ArticleFeedback on enwiki for upgrade'
21:42 mark: Added vlan 103 to ports e 15/5 to 15/8 (virt1-4 eth1) (tagged) on csw1-sdtpa
21:42 mark: Put ports e 15/1 to 15/4 (virt1-4 eth0) in vlan 105 (untagged) on csw1-sdtpa
21:40 mark: Created VLAN 105 ("virt-hosts") on csw1-sdtpa, assigned virtual interface ve 9, assigned ip 10.4.16.0/24
21:39 mark: Added virtual interface ve 7 ("virt") on csw1-sdtpa, assigned ip 10.4.0.1/24
21:39 mark: Removed virtual interface ve 7 ("virt") on csw5-pmtpa
21:24 Ryan_Lane: cleaning up searchidx1
21:11 RoanKattouw: Running extensions/ArticleFeedback/populateAFRevisions.php on testwiki
21:11 RoanKattouw: Creating article_feedback_revisions table on testwiki
21:08 RoanKattouw: Updating ArticleFeedback code on test
16:18 RobH: singer back online, blogs, survey, and secure should all be back in service
16:15 RobH: blogs offline due to singer outage, as well as survey, working on resolution
16:14 RobH: powercycling singer, unresponsive to ssh and to serial console
16:13 RobH: secure issue is being investigated
16:13 pdhanda: running cleanupTitles.php for all wikis in all.dblist
16:13 RobH: investigating singer cpu spike
15:36 mark: Implemented LDAP lookups on mail relay mchenry
14:13 Reedy: Delayed message. Ran repopulateCodePaths on mediawikiwiki
14:13 Reedy: Delayed message. Ran populateFollowupRevisions on mediawikiwiki
14:04 logmsgbot: catrope synchronized php-1.17/extensions/CodeReview/modules/ext.codereview.js 'Disable comment field autofocus'
19:54 awjr: disabling drupal/civicrm cron on contacts.wikimedia.org (singer) as part of an interim solution to civimail problems in CiviCRM 3.4beta1
18:31 ^demon: ran code review updates for mediawikiwiki
12:54 mark: Restarted pdns on linne
12:54 mark: Defined /64 neighbor blocks in 1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa in DNS
12:31 mark: Defined /31 neighbor blocks in 154.80.208.in-addr.arpa in DNS
09:15 logmsgbot: ariel synchronized php-1.17/wmf-config/CommonSettings.php 'increase account creation throttle value for el wiki for editing workshop (soon we have a two week break from these :-P)'
01:55 Ryan_Lane: upgrading squid to 2.7.9-5 on sq58 and sq73
01:35 Ryan_Lane: upgrading squid to 2.7.9-4 on sq58 and sq73
00:42 Ryan_Lane|food: also removing sheepdog from nova-*.tesla
00:39 Ryan_Lane|food: removing natty versions of libvirt and friends and re-installing the lucid versions
April 12
20:18 RobH: killed apache vhost for outreachcivi since it appears defunct in favor of contactscivi on singer
19:08 Ryan_Lane: upgrading nova-* on nova-*.tesla
19:05 Ryan_Lane: updating squid on sq73 to test new squid and patches
05:51 Ryan_Lane: purging nova-ajax-console-proxy from nova-compute1/2.tesla and adding to nova-controller.tesla
04:57 Ryan_Lane: purging nova-ajax-console-proxy from nova-controller.tesla and adding to nova-compute1/2.tesla
03:21 logmsgbot: aaron synchronized php-1.17/wmf-config/InitialiseSettings.php 'flaggedrevs for kawiki'
03:13 AaronSchulz: Enabled FlaggedRevs for Georgian Wikipedia
03:11 logmsgbot: aaron ran sync-common-all
April 10
05:22 logmsgbot: ariel synchronized php-1.17/wmf-config/CommonSettings.php 'increase account creation throttle value for el wiki for editing workshop (can't wait for local sysadmins to be able to do this :-P)'
April 9
17:07 logmsgbot: ariel ran sync-common-all 'sync for elwikinews round 2, let's get the import right this time folks cause this is too nerve-wracking'
16:40 logmsgbot: ariel synchronized all.dblist 'remove elwikinews, need to drop and recreate after borked import'
16:29 logmsgbot: robh synchronized php-1.17/wmf-config/InitialiseSettings.php 'logo updates for a couple wikis for phillipe'
10:10 mark: Changed dataset1's clock source to HPET, synced it with ntpdate and restarted ntpd
07:58 apergos: power cycling searchidx1, load was at 60, unresponsive to commands after login from mgmt console
02:46 RobH: troubleshooting a couple new wikis, had to sync-apaches and restart them gracefully
01:01 notpeter: changed my.cnf on storage3 to replicated-do-db= drupal,mysql,civicrm
00:50 Ryan_Lane: installing nova-ajax-console-proxy on nova-controller.tesla
17:10 notpeter: pushing out new dns zones. forgot to change ptr record for yvon...
15:09 RobH: updating dns with testblog info
13:36 mark: Added swap on /dev/sdc1 and /dev/sdd1 on ms5
13:34 mark: Stopped RAID10 array /dev/md2 again, sync takes too long
13:30 mark: Created RAID10 array for swap across first partition of 46 drives on ms5
13:21 mark: Stopped all rsyncs to investigate ms5's sudden kswapd system cpu load
07:57 apergos: assigned snapshot1 internal ip in dns
06:11 apergos: moving snapshot1 to internal vlan etc
04:15 notpeter: pushing new dns w/ eixamanius as cname for hooper and yvon as new name for box that was previously eixamanius
04:15 notpeter: stopping etherpad
April 7
21:43 notpeter: removed a silly check for hooper that I made and restarted nagios
19:06 Ryan_Lane: switching openstack deb repo back to trunk, and upgrading packages on nova-controller, since we are likely to target cactus now
15:40 mark: Restarted rsyncs
15:26 mark: Created a test btrfs snapshot of /export on ms6
15:12 mark: Temporarily stopped the rsyncs on ms5 to test zfs send performance
13:56 mark: Reenabled ms6 as backend on esams.upload squids
13:11 apergos: replaced ms4 in fstab on fenari with ms5 so we have thumbs mounted there
12:08 mark: Restarted rsyncs on ms5
12:07 apergos: nginx conf file change to "thumb" added to puppet
12:00 mark: Removed the test snapshot on ms5
11:47 apergos: edited in place /etc/nginx/sites-available/thumbs and /export/thumbs/scripts/thumb-handler.php to make thumbs generated on the fly return 200. they were returning 404
08:23 logmsgbot: catrope synchronized php-1.17/extensions/UploadWizard/UploadWizard.php 'Fix fatal due to missing API module'
08:17 logmsgbot: catrope ran sync-common-all
08:16 RoanKattouw: I meant srv196, not srv193
08:15 RoanKattouw: Deploying UploadWizard for real this time, forgot to svn up first. sync-common-all then clearMessageBlobs.php
08:14 RoanKattouw: Commenting out srv193 in mediawiki-installation node list because its timeouts take forever
08:10 RoanKattouw: srv196 is not responding to SSH or syncs from fenari (they time out after a looong time) but Nagios says SSH is fine. Should be fixed or temporarily depooled
08:08 RoanKattouw: Clearing message blobs
08:07 logmsgbot: catrope ran sync-common-all
08:04 RoanKattouw: Scap broke with sudo stuff AGAIN, running sync-common-all
08:01 RoanKattouw: Running scap to deploy UploadWizard changes
07:11 apergos: turned em off again, started seeing timeouts. bah
06:39 apergos: and two more...
06:31 apergos: restarted two of the 8 rsyncs on ms5, keeping an eye on them
01:31 domas: added nobarrier to xfs mount options on db32 and db37
April 6
20:38 RobH: updated puppet with a svn::client class (rt#721)
20:18 RobH: pulled wm09schols, wm10schols, and wm10reg out of enabled sites on singer
20:05 apergos: suspended all rsyncs on ms5, we were seeing nfs timeouts on the renderers all of a sudden
18:50 apergos: killed morebots and let the restart script start it up again
April 5
23:00 Ryan_Lane: restarting search indexer on searchidx to free space held by deleted logs
22:58 Ryan_Lane: clearing up some space on searchidx1
22:20 notpeter: crammed an etherpad db into db9's mysql hole.
17:57 Ryan_Lane: restarting llsearchd on all search boxes
17:45 RoanKattouw: Restarted morebots, running on wikitech as catrope
17:45 Ryan_Lane: changing the udp log location for search to emery
12:16 logmsgbot: catrope synchronized php-1.17/wmf-config/InitialiseSettings.php 'Undo $wgForceUIMsgAsContentMsg change on incubator from last night per DannyB'
April 4
23:28 Ryan_Lane: uploading ircecho package to lucid-wikimedia repo, for nagios irc bot
22:22 Ryan_Lane: upgrading wikimedia-task-appserver package on srv281
22:22 Ryan_Lane: uploading new version of wikimedia-task-appserver to lucid-wikimedia repo; merges back in 1.17 changes that were missing
22:04 RobH: updated noc robots entry in its apache config on fenari
21:58 Ryan_Lane: srv281 is acting as a temporary scaling server for testing of lucid imagescalers, and to help with thumbs load.
21:27 Ryan_Lane: depooling srv281 from appservers
21:21 Ryan_Lane: syncing apaches to get configuration pushed to srv281
21:16 Ryan_Lane: rebooting srv281
21:01 Ryan_Lane: adding srv281 to rendering cluster in pybal via fenari
20:32 Ryan_Lane: uploading a new version of wikimedia-task-appserver fixing a problem with sync-common
20:13 logmsgbot: catrope synchronized php-1.17/wmf-config/InitialiseSettings.php 'Add mainpage to $wgForceUIMsgAsContentMsg for incubatorwiki'
19:55 Ryan_Lane: srv281 successfully ran imagescaler puppet class. ready for testing.
19:47 Ryan_Lane: adding php5-fss to lucid-wikimedia repo
19:11 Ryan_Lane: adding wikimedia-task-appserver to lucid-wikimedia repo
18:58 RobH: bugzilla updates complete
18:50 RobH: updating bugzilla per rt#718 bz#28409 bz#28402
18:42 notpeter: added cname etherpad for hooper.wikimedia.org
18:00 Ryan_Lane: added the wikimedia-fonts package to lucid-wikimedia repo
17:29 notpeter: adding self to nagios group. rebooterizing nagios.
05:58 apergos: cleaned up perms on commons/thumb/a/af, left over from interrupted rsync test last night
05:50 logmsgbot: tstarling synchronized php-1.17/wmf-config/InitialiseSettings.php 'enabling pool counter on all wikis'
04:12 logmsgbot: tstarling synchronized php-1.17/wmf-config/InitialiseSettings.php 'enabling PoolCounter on testwiki and test2wiki'
01:22 Tim: apache CPU overload lasted ~10 mins, v. high backend request rate, don't know cause, seems to have stopped now
April 3
18:42 apergos: 8 rsyncs of ms4 thumbs restarted with better perms so scalers can write... in screen as root on ms5. If we start seeing nfs timesouts in the scaler logs please shoot a couple
17:14 mark: Deployed max-connections on all cache peers for esams.upload squids to their florida parents (current limit 200)
17:00 mark: Removed the carp weights on the esams backends again, as the weighting was completely screwed up
16:59 mark: Started knsq13 backend
14:27 logmsgbot: catrope ran sync-common-all
14:26 RoanKattouw: Running sync-common-all to deploy r85256
13:03 apergos: shot rsyncs on ms5, setting 777 dir perms on all thumbnail dirs (eg e/ef/blablah.jpg) so scalers can write into them
12:53 apergos: did same for rest of projects and subdirs (777 on hash dirs)
12:46 apergos: chmod 777 on commons/thumb/*/* on ms5 so that scalers can create directories in there (mismatch of uid apache vs www-data)
11:12 mark: Raised per-squid connection limit to ms5 of 200 to 400 connections
11:05 mark: Raised per-squid connection limit to ms5 of 100 to 200 connections
10:55 mark: Fixed squid loop, the pmtpa.upload squids were using the esams squids as "CARP parents for distant content"
10:29 mark: Fixed puppet on sq42/43
09:44 mark: Lowered FCGI thumb handlers from 90 to 60 again, to reduce concurrency
08:08 mark: Started 4 more rsyncs (8 total now)
07:49 mark: Removed mlocate from ms5, puppetising
07:42 mark: Started 4 rsyncs from ms4 to ms5 (--ignore-existing)
07:32 mark: increased thumb handler count from 60 to 90
07:11 mark: Doubled the amount of fcgi thumb handlers
07:08 mark: Turned off logging of 404s to nginx error.log
06:50 mark: Restarted Apache on the image scalers
06:49 mark: Reconfigured ms5 to use the 404 thumb handler
06:48 Ryan_Lane: disabling nfs on ms4
06:33 mark: Running puppet on all apaches to fix fstab and mount ms5.pmtpa.wmnet:/export/thumbs
06:32 mark: Unmounting /mnt/thumbs on all mediawiki-installation servers
06:30 mark: Remounted NFS /mnt/thumbs on the scalers to ms5
06:28 Ryan_Lane: bring nfs back up
06:28 Ryan_Lane: brought ms4 back up. stopping the web server service and nfs
06:20 mark: Setup NFS kernel server on ms5
06:18 Ryan_Lane: powercycling ms4
05:29 Ryan_Lane: rebooting ms4 with -d to get a coredump
05:14 apergos: reanbling webserver on ms4 for testing
04:45 apergos: stopping web service on ms4 for the moment
04:29 apergos: shot webserver again
04:26 apergos: turned off hourly snaps on ms4, turned back on webserver and nfs
04:09 apergos: rebooted ms4, shut down webserver and nfsd temporarily for testing
02:58 apergos: still looking at kernel memory issues, still rebooting, ryan should be here in a few minutes to help out
02:03 apergos: a solaris advisor... also have zfs arch cache max to 2g which is ridiculously low but wtf right?
02:02 apergos: set tcp_time_wait_interval to 10000 at suggestion of
01:37 apergos: lowered zfs arch max to 2g (someone should reset this later)... will take effect on next reboot
00:29 apergos: rebooting with the new zfs arc cache max value, which will reduce the min value as well... dunno if this will give us enough breathing room or not
00:24 apergos: set zfs arc cache to ridiculously low value of 4gb, since when it's healthy it's using much less than that (1gb), this will take effect on reboot
00:22 Reedy: Still experiencing MS4 issues, thumb service is likely to be problematic for most users
April 2
23:47 apergos: rebooting ms4 from serial console, out to lunch and took the renderers down too
20:51 notpeter: added self to /etc/nagios/contacts.cfg
20:32 logmsgbot: awjrichards synchronized php-1.17/extensions/UserDailyContribs/api/ApiUserDailyContribs.php 'r85088 updates to UserDailyContribs API to retrieve past year edit count'
13:37 logmsgbot: mark synchronized php-1.17/wmf-config/InitialiseSettings.php 'Disable ehcache'
11:43 mark: set max_stale 1 month on pmtpa.upload squids
11:21 mark: Set refresh_pattern on pmtpa upload squids
10:47 mark: Set refresh_pattern on sq85
00:21 Ryan_Lane: added drac management script and dependencies to fenari, via puppet
00:06 Ryan_Lane: make that python-paramiko
00:06 Ryan_Lane: installing paramiko on fenari
March 30
23:15 Ryan_Lane: adding puppet configuration for the collector and adding that class to spence
23:09 Ryan_Lane: updating udpprofile package again on spence
23:02 Ryan_Lane: upgrading udpprofile package on spence to fix init script
22:48 Ryan_Lane: killing the collector on spence, and trying to run via the package
22:48 Ryan_Lane: installing udpprofile package on spence
22:44 Ryan_Lane: created a package for the udpprofile code in svn; added to the main lucid-wikimedia repo
19:33 Ryan_Lane: fixed sanger certificate check for ldaps. warning will go away on next puppet run
17:05 Ryan_Lane1: enabled a nagios certificate check on sanger for ldaps
16:57 thedj: Maintenance ongoing on image server. Thumbnails on #Wikimedia sites may appear broken; we expect to restore full service shortly. (by Danese)
15:23 apergos: s/fine/find/
15:22 logmsgbot: ariel synchronized php-1.17/wmf-config/CommonSettings.php 'increase account creation throttle value for el wiki for editing workshop (maybe we should fine a better solution?)'
13:03 mark: Started 8 parallel rsyncs of ms4 /export/thumbs/wikipedia/commons/thumb/{8..f} to ms5
10:50 Tim: restarted ehcache with increased cache size limits
21:41 logmsgbot: ariel synchronized php-1.17/extensions/Narayam/Narayam.i18n.php 'remove BOM from file, was being written to stdout'
19:13 apergos: running rsync for White Cat again to pick up en wiki 20110115 hist files, on dataset2 in screen session as root, please shoot after 3 days (check with user first in channel)
15:05 logmsgbot: mark synchronized php-1.17/wmf-config/db.php 'Repooled db38 with lower load (100 instead of 400)'
14:57 logmsgbot: mark synchronized php-1.17/wmf-config/db.php 'Depooling db38'
10:55 logmsgbot: midom synchronized php-1.17/wmf-config/mc.php 'srv206 to srv243 swap'
10:53 domas: I misread before. memcached check was broken for 23 days, not 23 hours. duct-taped by creating symlink php-1.17 -> wmf-deployment on spence, not sure for how long :)
02:45 logmsgbot: aaron synchronized php-1.17/wmf-config/InitialiseSettings.php 'Added missing autopatrolled group to siwiki'
02:29 logmsgbot: aaron synchronized php-1.17/wmf-config/InitialiseSettings.php 'Added missing rollbacker group to siwiki'
02:07 logmsgbot: aaron synchronized php-1.17/wmf-config/flaggedrevs.php 'DO NOT PUT ABUSEFILTER SETTINGS IN HERE'
02:06 logmsgbot: aaron synchronized php-1.17/wmf-config/InitialiseSettings.php 'Copy abusefilter right settings from flaggedrevs.php to here, were they belong'
16:28 logmsgbot: catrope synchronized php-fatal-error.html 'Update this file from SVN /trunk/tools/php/wmerrors/error.html . This also changes its chmod so people who are not Tim can change this file'
17:19 RobH: dataset1 back online for testing, apergos will take over later today/tomorrow, will not go into service just yet
14:47 logmsgbot: catrope ran sync-common-all
14:46 RoanKattouw: Running sync-common-all to deploy r84467, r84484, r84495 and r84530. Needed because directory additions don't work with sync-file AFAIK
06:16 logmsgbot: ariel synchronized docroot/www.wikimedia.org/google01afe16dd444713b.html 'authorize generic tech account for google webmaster tools (for use with google storage)'
00:28 Tim: removing old binlogs from db9
March 21
23:30 Ryan_Lane: upgrading nova-* packages on nova-*.tesla
23:30 Ryan_Lane: installing ajaxterm on nova-compute1/2
21:59 Reedy: dataset1 is getting another beating. You know it deserves it, right?
20:42 logmsgbot: jeluf synchronized php-1.17/wmf-config/InitialiseSettings.php '28129 - Enable subpages in main namespace on sawikisource'
20:37 logmsgbot: jeluf synchronized php-1.17/wmf-config/InitialiseSettings.php '28134 - Enable alias "WL" -> "Wikilajme" for Albanian Wikinews'
20:18 logmsgbot: jeluf synchronized php-1.17/wmf-config/InitialiseSettings.php '27256 - Correcting content page count at en.wikibooks and pt.wikibooks'
20:12 logmsgbot: jeluf synchronized php-1.17/wmf-config/InitialiseSettings.php '26281 - Please allow AbuseFilter on Hungarian Wikisource and Wikibooks'
12:35 RoanKattouw: Clearing and rebuilding LocalisationUpdate cache. Using altered version of /h/w/bin/l10nupdate which pushes updates to Apaches immediately, then clears message blobs
09:39 logmsgbot: catrope synchronized php-1.17/wmf-config/InitialiseSettings.php 'Add English Wikisource to import sources on sawikisource'
09:36 logmsgbot: catrope synchronized php-1.17/wmf-config/InitialiseSettings.php 'Change sitename too on orwiki, was untranslated'
09:35 logmsgbot: catrope synchronized php-1.17/wmf-config/InitialiseSettings.php 'Fix previous change - set $wgMetaNamespace, not $wgSitename'
09:33 logmsgbot: catrope synchronized php-1.17/wmf-config/InitialiseSettings.php 'bug 27948 - Set $wgSitename on orwiki'
13:01 logmsgbot: tstarling synchronized php-1.17/wmf-config/InitialiseSettings.php 're-enabling PoolCounter (was disabled during the previous breakage)'
12:59 logmsgbot: tstarling synchronized php-1.17/includes/PoolCounter.php 'r84323: do not show error to user on connection refused etc.'
12:25 Tim: attempted to restart poolcounterd with a different file descriptor limit. It didn't start up properly, causing every parse request to return an error. Fixed in about 30 seconds.
21:50 logmsgbot: jeluf synchronized php-1.17/wmf-config/InitialiseSettings.php '27842 - enable blocked user to edit talk page on hewikiquote'
21:48 logmsgbot: jeluf synchronized php-1.17/wmf-config/InitialiseSettings.php '27740 - give sysop the abusefilter-modify-restricted right in the Hebrew Wikipedia'
21:39 logmsgbot: jeluf synchronized php-1.17/wmf-config/InitialiseSettings.php '27840 - enable transwiki import to latin wikisource'
21:34 logmsgbot: jeluf synchronized php-1.17/wmf-config/InitialiseSettings.php '28092 - Add Portal namespace on Swedish Wikiversity'
21:32 logmsgbot: jeluf synchronized php-1.17/wmf-config/InitialiseSettings.php '28092 - Add Portal namespace on Swedish Wikiversity'
19:47 domas: fixed autocommit in wmf-config by uncommenting a cron entry in my crontab \o/
19:44 logmsgbot: jeluf synchronized langlist 'Added ltg and kbd'
19:00 awjr: Archived Hudson build logs for CiviMail-related Hudson jobs to storage3 (/archive/fundraising/hudson_builds) and cleared out related log dirs on Grosley. Hudson CiviMail-related builds now back to normal.
10:13 Tim: fixed punctuation in bugzilla quip 1155 by direct DB access
03:05 logmsgbot: tstarling synchronized php-1.17/wmf-config/InitialiseSettings.php 'enabling PoolCounter on all wikis'
03:04 Tim: poolcounterd CPU usage still negligible. Enabling on all wikis.
03:00 logmsgbot: tstarling synchronized php-1.17/wmf-config/InitialiseSettings.php 'enabling PoolCounter on en.wikipedia.org'
19:28 Ryan_Lane: added svg to allowed upload types for wikitech
17:51 logmsgbot: awjrichards synchronized php-1.17/wmf-config/CommonSettings.php 'Cleaning out cruft from CentralizeNotice config; Updating how https is handled for CentralNotice; PoolCounter settings inclusion in CommonSettings (but not checking in/enabling new Poolcounter settings in InitialiseSettings)'
06:41 Tim: running scap to get the PoolCounter extension
06:28 Tim: installed poolcounter on tarin
06:08 Tim: tarin upgraded to lucid
05:15 Tim: tarin just has an ancient test wordpress install on it which hasn't been touched since April 2010. Removing the website and mysql instance, will attempt to upgrade the distro.
03:20 logmsgbot: tstarling synchronized php-1.17/includes/HTMLCacheUpdate.php 'disable squid cache invalidation from HTMLCacheUpdate, to hopefully eliminate apache CPU spikes'
11:17 logmsgbot: hashar synchronized php-1.17/wmf-config/InitialiseSettings.php '27728 - Add WP as an alias to project namespace in Portuguese Wikipedia'
11:02 logmsgbot: hashar synchronized php-1.17/wmf-config/InitialiseSettings.php '27898 - Create Autor namespace at ca.wikisource'
10:52 logmsgbot: hashar synchronized php-1.17/wmf-config/InitialiseSettings.php '27898 - Create Autor namespace at ca.wikisource'
10:44 logmsgbot: hashar synchronized php-1.17/wmf-config/InitialiseSettings.php '27413 - Allow eswiki bureaucrats to add/remove the confirmed usergroup'
09:56 logmsgbot: hashar synchronized php-1.17/wmf-config/InitialiseSettings.php '27998 - Add WP alias on dawiki'
01:31 logmsgbot: midom synchronized php-1.17/wmf-config/db.php 'even out a bit'
00:07 domas: db16 write-behind cache failed yet again, changing states doesn't help much, battery seems to be dead ( FSA_EM_ENHANCED_BATTERY_CHANGE ? ) - set innodb log sync to 0
14:17 logmsgbot: hashar: shell madness is over for today.
14:06 logmsgbot: hashar synchronized php-1.17/wmf-config/InitialiseSettings.php '27195 - Grant anonymous users 'createpage' right and enable Extension'
14:01 domas: fixed *-filter segfaults on locke, probably :)
13:14 logmsgbot: hashar: Cleaning bugs in the Wikimedia product
13:04 logmsgbot: hashar: Reedy made me an admin on bugzilla. Then allowed myself to edit components.
12:27 logmsgbot: hashar synchronized php-1.17/wmf-config/InitialiseSettings.php '27971 - Change the translation of "user" and "user_talk" namespaces on ka.wiki'
12:14 logmsgbot: hashar synchronized php-1.17/wmf-config/InitialiseSettings.php '27361 - Update logo for Fiji Hindi Wikipedia'
12:05 logmsgbot: hashar: ran namespaceDupes.php on cywiki : Looks good!
21:10 logmsgbot: hashar: we have pending changes in db.php and CommonSettings.php . Please commit! :p
18:52 awjr: restarting dovecot on grosley in feeble attempt to resolve civimail issue
18:22 RoanKattouw: Put the 1.16wmf4 versions of skins/vector/main-{ltr,rtl}.css in /h/w/c/php/skins/vector as a temp hack so Squid-cached pages don't point to nonexistent CSS
18:18 Ryan_Lane: removing custom built memcached from lucid-wikimedia
17:03 Ryan_Lane: adding memcached to lucid-wikimedia repo
16:17 RoanKattouw: updateCollation.php has finished on s4 (commonswiki) and s6 (fr+ja+ruwiki). Now only running on s1 (enwiki)
12:31 RoanKattouw: updateCollation.php finished on s3
11:55 RoanKattouw: Correction, 11:00 UTC
11:55 RoanKattouw: updateCollation.php finished on s5 (dewiki) around 12:00 UTC
11:52 logmsgbot: catrope synchronized php-1.17/extensions/WikimediaMobile/WikimediaMobile.php 'Style version bump'
11:50 logmsgbot: catrope synchronized php-1.17/extensions/WikimediaMobile/MobileRedirect.js 'Live hack mobile redirect script to fix JS errors'
10:10 RoanKattouw: cl_collation!='uppercase' was causing a full table scan on categorylinks on dewiki and frwiki. FORCE INDEX didn't help, so I live-hacked the script to do cl_collation='' instead. This makes the script run noticeably faster across the board
10:07 RoanKattouw: updateCollation.php has finished on s7
22:41 logmsgbot: awjrichards synchronized php-1.17/wmf-config/CommonSettings.php 'Adding openZIM as file format for Collection extension'
21:47 RoanKattouw: Restarted all collation update jobs with a more efficient SELECT query (with STRAIGHT_JOIN) and a smaller batch size (50 instead of 1000)
21:18 RoanKattouw: Restart updateCollation.php on s4 (commons) with a batch size of 50 to see how that plays out
21:07 RoanKattouw: Move complete. Aborted updateCollation.php on fenari, now running 7 instances of /home/catrope/collatecluster in a screen on hume
21:01 RoanKattouw: Breaking out updateCollation.php run into one process per DB cluster, and running them on hume rather than fenari
20:42 logmsgbot: catrope synchronized php-1.17/includes/CategoryPage.php 'Experimental fix by Aryeh for category sorting problems'
15:16 RoanKattouw: Updating MediaWiki on fenari from SVN to reintroduce category collation changes. For now, this is only for testing, so don't run sync-common-all or scap just yet
15:03 RoanKattouw: Looks fine
15:02 RoanKattouw: upgrade-1.17wmf1-final.php seems to have finished, running it again to verify it succeeded
13:28 logmsgbot: catrope synchronized php-1.17/wmf-config/InitialiseSettings.php 'NamespaceWithSubpages settings for Czech wikis, per Danny B'
12:44 mark: Changed 'wrr' scheduler to 'sh' on payments and owas LVS services
12:16 logmsgbot: midom synchronized php-1.17/extensions/FlaggedRevs/specialpages/PendingChanges_body.php 'syncing http://p.defau.lt/?9RQEZQiZ_WeMT7_f7TpX6A live too'
10:08 logmsgbot: midom synchronized php-1.17/wmf-config/db.php 'adding db33 back with snapshotting disabled - checking if that plus temptables were causing the melt'
08:53 logmsgbot: jeluf synchronized php-1.17/wmf-config/db.php 'removed db33 from s4'
05:51 Tim: deployed new jobs-loop.sh script which prioritises fast important jobs like email notification. Thousands of emails are currently being sent.
05:23 Tim: apt-get upgrade on prototype.wikimedia.org
05:05 logmsgbot: tstarling synchronized php-1.17/wmf-config/InitialiseSettings.php 'added filemover group to enwiki, as on commons, per request on WP:VPR'
04:33 Tim: re-running upgrade-1.17wmf1-final.php to do db34, db21 and db18
04:31 Tim: new master positions: db39-bin.000001:106, db29-bin.001:79, db37-bin.000001:106
21:44 RoanKattouw: Running sync-common-all to deploy r83487
21:41 logmsgbot: hashar synchronized php-1.17/wmf-config/InitialiseSettings.php '27875 - Create a new usergroup Patroller with 'autopatrol' and 'patrol' right on simple.wikipedia'
21:24 logmsgbot: hashar: subversion working copy is now clean :)
21:22 logmsgbot: hashar: committing stuff in wmf-commit since the svn repository is available again
18:54 logmsgbot: hashar synchronized php-1.17/wmf-config/InitialiseSettings.php 'Bug 27902 - Namespace fix on dawiki -- Help page talk too per IRC discussion'
15:02 hcatlin: Deployed to mobile again. Seems a little angry on ganglia.
14:28 hcatlin: Mobile deploy complete. Seems stable.
March 5
19:21 logmsgbot: catrope synchronized php-1.17/extensions/WikimediaMobile/WikimediaMobile.php 'Live hack to put mobile script back in <head>'
03:53 logmsgbot: andrew synchronized php-1.17/extensions/LiquidThreads/LiquidThreads.php 'Attempt to fix missing NSes on svwiktionary'
03:53 logmsgbot: andrew synchronized php-1.17/extensions/LiquidThreads/classes/Hooks.php 'Attempt to fix missing NSes on svwiktionary'
March 4
22:25 logmsgbot: hashar: Fixed outdated page by manually purging it (?purge + confirm). Issue resolved. Mark> you probably want to purge the squid/varnish caches for anything before 1.17 deployment :p
22:22 logmsgbot: hashar: page below is dated "Thu, 03 Feb 2011 12:18:28 GMT" HIT from amssq39.esams.
20:00 rainman-sr: modify lsearch-global-2.1.conf to point to an existing InitaliseSettings.php file (/home/wikipedia/common/php/wmf-config/InitialiseSettings.php). The file it was pointing before was removed a couple of days ago which made the incremental indexer broken.
02:24 Tim: updated the scap scripts in wikimedia-task-appserver, now racing puppet to update the package everywhere
02:02 Tim: updated scap and sync-file and related scripts in /home/wikipedia/bin to work with 1.17
01:22 ^demon: removed requirement for comment on bug resolution
00:56 Tim: fixed /home/wikipedia/bin/mysql-list, it makes a list of mysql servers
00:31 Tim: moving a lot of old scripts out of /home/wikipedia/bin to junk
00:19 Tim: fixed subversion versioning in php-1.17/wmf-config, checked in 1.17-related changes into the local subversion instance
00:10 logmsgbot: root ran sync-common-all
00:09 Tim: sync-common-all
00:07 Tim: except fonts
00:03 Tim: moved some old junk out of /home/wikipedia/common to /home/wikipedia/junk/common: nagios-fedora-plugins, fonts, lockfiles, proxies
February 28
23:53 Tim: tarred up the old MediaWiki directories in /home/wikipedia/common and moved them out to /home/wikipedia/lazy-backups/nostalgia
23:31 Tim: created job-runners group with a list of servers with job runners
23:26 Tim: cleaned up /etc/dsh/group somewhat, moved old files to the junk subdirectory
22:42 apergos: removed the wm-job-runner package on srv151 through srv190, so job runners don't accidentally get started back up on those hosts
21:59 apergos: stopped job runner on hosts that also serve as ext stores (srv151-srv190) to see if that deals with the "Too many connections" issue seen today for connections to srv160 and srv161
21:10 Ryan_Lane: repooling sq68
21:05 Ryan_Lane: pushing out load.php fix for domains like arbcom.nl.wikipedia.org
15:32 apergos: apt-get clean on srv219 and 222 to keep them from whining as much about disk space
09:09 logmsgbot: hashar synchronized php-1.17/wmf-config/InitialiseSettings.php 'Bug 27240 - Request for localized logo to be effected at tpi.wikipedia.org'
February 25
22:36 apergos: job queue reported slower than molasses, checked the job runners and there was one host running something. I did not restart the job runners though because the package has not been updated for 1.17 and was using the pre-1.17 deployment codebase. woops.
13:59 mark: Powered down transcode3 (preparing for move)
13:30 mark: Reinstalled lvs2 with Lucid, preparing as new LVS balancer
13:02 mark: Removed BGP session for lvs2 on csw5-pmtpa
12:05 logmsgbot: catrope synchronized php-1.17/wmf-config/InitialiseSettings.php 'Add sidebar messages to $wgForceUIMsgAsContentMsg for mediawikiwiki'
11:50 logmsgbot: hashar: verified logos are live on kowikiquote and nnwikiquote. The later need a transparent background, user notified
11:47 logmsgbot: hashar synchronized php-1.17/wmf-config/InitialiseSettings.php 'logos for kowikiquote (bug 27548) and nnwikiquote (bug 27555)'
11:38 logmsgbot: hashar: running updateArticleCount on ttwiki for bug 27705
11:30 logmsgbot: hashar: srv169 & srv284 unreacheable. Maybe they should be removed. srv266 still in timeout :(
11:29 logmsgbot: hashar: checked that the three previous logos are correctly deployed
11:26 logmsgbot: hashar synchronized php-1.17/wmf-config/InitialiseSettings.php 'logos for knwiki (bug 27657), sawiki (bug 27661) and orwiki (bug 27704). Please note they are png thumbnails of the .svg images'
February 24
21:18 logmsgbot: hashar synchronized php-1.17/wmf-config/InitialiseSettings.php 'bug 27495 - adding user namespaces aliases in portuguese for ptwiki. Default is brazilian variant.'
19:34 logmsgbot: catrope synchronized php-1.17/wmf-config/CommonSettings.php 'Another fix: apparently wikieditor-preview and wikieditor-previewDialog both exist, gotta hide both of them'
19:32 logmsgbot: catrope synchronized php-1.17/wmf-config/CommonSettings.php 'Hide preview dialog preference, this feature is unstable. Was hidden before but got exposed because the preference key changed'
16:24 RoanKattouw: Restarted morebots as catrope. robh still has a zombie instance running that I can't kill
February 23
21:24 RoanKattouw: Adding skins-1.17 symlink to all docroots that also have skins-1.5
21:21 logmsgbot: catrope synchronized php-1.17/extensions/LiquidThreads/classes/View.php 'Also stop LQT from including JUI CSS and add WikiEditor dialog CSS instead'
16:47 RoanKattouw: Deploying r82678 with individual syncs
16:30 logmsgbot: catrope synchronized php-1.17/wmf-config/CommonSettings.php 'Set account creation throttle to 300 for 194.64.239.* on elwiki. Requested by Ariel'
15:39 logmsgbot: catrope synchronized php-1.17/wmf-config/InitialiseSettings.php 'Enable $wgHtml5 on all wikis'
15:26 logmsgbot: catrope synchronized php-1.17/wmf-config/InitialiseSettings.php 'Enable $wgHtml5 on testwiki. If this goes well I will enable it on all wikis soon'
14:51 mark: Fixed puppet for lvs[234]
07:13 logmsgbot: tstarling synchronized php-1.17/wmf-config/CommonSettings.php 'removed hacks to increase memory_limit to 100MB in certain cases'
07:12 logmsgbot: tstarling synchronized php-1.17/wmf-config/CommonSettings.php 'increased memory_limit to 120MB'
23:54 logmsgbot: tstarling synchronized php-1.5/includes/DefaultSettings.php 'commenting out the $wgStylePath reference assignment, to try to fix the canary mismatch errors'
23:42 Tim: fixed logrotate on nfs2 and ran it with -f
16:06 logmsgbot: mark synchronized php-1.5/../php-1.17/includes/media/SVGMetadataExtractor.php
15:55 mark: Disabled apparmor profile for apache2 on srv178, which was spamming the log
15:09 mark: Temporarily disabled wikidiff on srv151 (puppet will reenable soon)
14:53 mark: Cleaned up srv178
04:39 logmsgbot: demon synchronized php-1.17/includes/parser/Preprocessor_DOM.php 'Sync wfErrorLog removals, was filling up /home'
04:15 apergos1: killed udp2log on nfs2 and restarted it via init script after removing preprocessorfatal.log again, got back 160 gb, we had gone down to 6.5 gb free on /home on fenari
02:47 Tim: re-enabled snaprotate, bug 27471 does not involve data loss
02:22 logmsgbot: LocalisationUpdate failed
01:59 Tim: redirected /home/wikipedia/common/php symlink to php-1.17
01:36 Tim: disabled snaprotate.pl temporarily while I investigate bug 27471. This has to be reverted within a few days or else the snapshots will fill
00:04 Tim: rebooting spence, its load is too high to let me get a terminal
February 16
21:17 logmsgbot: catrope synchronized php-1.17/wmf-config/db.php 'Depool srv178 from ES'
21:09 Ryan_Lane|dc: on nfs2
21:09 Ryan_Lane|dc: killing udp2log and starting it via init script
21:08 Ryan_Lane|dc: restarting udp2log on nfs2
21:07 Ryan_Lane|dc: restarting rsyslog on nfs2
21:05 Ryan_Lane|dc: restarting rsyslog on fenari
21:00 RoanKattouw: Removing /h/w/logs/preprocessorfatal.log , was 146GB
20:59 logmsgbot: catrope synchronized php-1.5/includes/parser/Preprocessor_DOM.php 'Remove debug logging, 146 GB of the same error message over and over again is enough'
20:39 RobH: srv178 powered down, full disk, preventing syncs, needs to be powered backup and fixed later
20:12 hashar: Freenode seems down. I can connect using IPv6 though :)
hashar: Most probably the IPv6 server had a netsplit.
20:07 RoanKattouw: Changed /home/wikipedia/bin/l10nupdate (L10Update cron job) to use php-1.17 instead of wmf-deployment
20:04 RoanKattouw: Fixed redirect loop on mlwiki a few minutes ago by changing the namespace names for NS_CATEGORY and NS_CATEGORY_TALK in MessagesMl.php
14:48 logmsgbot: catrope synchronized live-1.5/MWVersion.php 'Previous sync fixed mediawikiwiki for HTTP, was already working for HTTPS. Now fixed same bug for foundationwiki and oldwikisource so those will work once switched over'
22:02 logmsgbot: catrope synchronized live-1.5/resources 'Point this symlink to the 1.17 tree'
21:47 RoanKattouw: Fixed accidental svn1.6 contamination of php-1.17 by checking it out again with svn1.4, and copying modified, ignored and unversioned files. Kept the original tree around in /h/w/c/old-php-1.17-with-svn-1.6 in case something is missing, although diff -rq | grep -v .svn didn't think so
12:20 logmsgbot: demon synchronized php-1.17/includes/ChangesList.php 'Fix warning, $classes should be an array for insertTags()'
12:18 logmsgbot: catrope synchronized php-1.17/extensions/ParserFunctions/ParserFunctions.i18n.magic.php 'Touching this file so the magic words fix takes effect'
09:48 logmsgbot: catrope synchronized php-1.17/resources/jquery/jquery.client.js 'Attempting to fix JS error'
09:45 mark: Deploying /etc/php5/conf.d/wikidiff2.ini using puppet
09:43 logmsgbot: tstarling synchronized php-1.17/wmf-config/CommonSettings.php 're-enable wikidiff2, use extension_loaded() to test for existence instead of file_get_contents()'
02:14 logmsgbot: tstarling synchronized php-1.5/wmf-config/InitialiseSettings.php 'stats for phase 1 deployment'
02:11 logmsgbot: tstarling synchronized php-1.5/includes/GlobalFunctions.php 'configurable profile ID for wfIncrStats()'
01:29 logmsgbot: tstarling synchronized php-1.5/StartProfiler.php 'set up profiling section for the first batch of wikis which will be moved to 1.17, for baseline'
00:48 logmsgbot: tstarling synchronized php-1.5/includes/parser/ParserCache.php 'remove age stats'
07:39 Tim: switching master on s2 from db13 to db30
04:35 apergos: running rebuild of the first of 4 history dump pieces on snapshot3, root, screen session
February 6
21:27 apergos: spliting up stubs into pieces in prep for resume of truncated history dumps fo en wiki; running on snapshot3 in screen session as root (see "part1,8,9,10.sh" in /backups-atg for what's running)
10:38 logmsgbot: jeluf synchronized php-1.5/wmf-config/CommonSettings.php 'testwiki: Use FreeSansWMF as default font for EasyTimeline'
10:38 logmsgbot: jeluf synchronized fonts/FreeSansWMF.ttf 'Merged FreeSerif int FreeSans. *should* now provide Malayalam characters'
07:54 apergos: filled up root filesystem on snapshot3 by accident running an xml dump repair job, cleared out the file and restarted the job writing elsewhere, we whould be back to normal on that host now
06:49 apergos: four of the newly created en wikipedia history bz2 files are incomplete; they will be resumed after the 1.17 deployment has been pushed and is proven not to impact the dumps
February 5
22:35 logmsgbot: jeluf synchronized fonts/FreeSans.ttf 'New version from freefont-ttf-20100919'
22:33 logmsgbot: jeluf synchronized fonts/FreeSerif.ttf 'New version from freefont-ttf-20100919'
21:06 logmsgbot: jeluf synchronized php-1.5/wmf-config/CommonSettings.php 'testwiki: Use FreeSerif as default font for EasyTimeline'
21:01 logmsgbot: jeluf synchronized fonts/FreeSerif.ttf '21497 - Change default font for EasyTimeline on ML projects to something that actually has glyphs for Malayalam characters'
12:44 logmsgbot: hashar: Marked bug 24078 as wontfix, wikis we have to request locallogo one by one
12:39 logmsgbot: hashar synchronized php-1.5/wmf-config/InitialiseSettings.php 'revering back to commons logo since some wiki have upload disabled (per IRC)'
16:36 Ryan_Lane: pushing mod_expires fix to all apaches
16:33 Ryan_Lane: restarting apache on srv193 to force load of configuration
16:25 Ryan_Lane: testing mod_expires fix on test.wikipedia.org
04:49 logmsgbot: tstarling synchronized php-1.5/wmf-config/db.php 'adding a max lag so that servers executing schema changes will automatically be skipped'
12:08 RoanKattouw: Had to svn switch --relocate the tree from svn+ssh:// to http:// in order to be able to do so without knowing Priyanka's SSH key passphrase
12:07 RoanKattouw: Updated /srv/org/wikimedia/prototype/wikis/rc and its extensions on prototype
February 2
21:01 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'dpl nlwiki reverted to false'
20:55 logmsgbot: robh synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 8240 and 8563'
14:56 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'Enable subpages in API namespace on mediawikiwiki'
14:29 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'Set $wgSitename to Wikcionario on eswiktionary per IRC discussion and existing use'
13:54 mark: Unblocked some remoteLoaders in squid conf
19:07 Ryan_Lane: purging bin logs to mysql-bin.001858 on db9
18:33 RobH: updated puppet to move user pdhanda into mortals group, able to push apache config changes
18:18 Ryan_Lane: uninstalling php-openid from nova-controller.tesla
17:14 Ryan_Lane: installing php-openid on nova-controller.tesla
09:06 tomaszf: installed php5-dev and PECL hash on grosley for david
09:06 tomaszf: installed php-mhash on grosley
January 30
17:32 Ryan_Lane: changing otrs to use the star cert
17:18 Ryan_Lane: adding star cert to williams
14:53 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '26363 - Please Add Patroller User Group to fawikinews'
January 29
20:31 apergos: one more time for en wiki history dumps (looks like they might actually work this time). in screen on snapshot3, from /backups-atg.
08:46 apergos: starting enwiki history dumps (finally) with latest guesses at page ranges so we get more even no. of revisions per chunk... in screen on snapshot3, out of /backups-atg
08:11 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '26988 - Changing the logo for Buginese Wikipedia'
18:44 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '26915 - Modify $wgNamespaceRobotPolicies to noindex User and User talk namespaces on the French Wikipedia (fr.wikipedia.org)'
18:40 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '26897 - Enable flag 'flood' in Spanish Wikinews (es.wikinews)'
18:37 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '26946 - Create new group on Russian Wikipedia'
17:50 RobH: updated dns for harmon
17:23 mark: Moved ersch, alsted and harmon into the sandbox vlan
17:23 mark: Replaced ganglia aggregators for MySQL group
14:54 RobH: emery shutdown, needs to have drives swapped back to default
00:51 tomaszf: reset Renflauf pass on wikitech
00:13 richcole: thistle, ixia,lomaria going down for decommissioning a
January 25
22:55 richcole: db1-4 going down for decommission and wipe
22:20 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '26889 - Set WB namespace alias to NS_PROJECT in Bengali wikibook'
22:14 RobH: project2 moved and back online
22:12 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '26916 - Enable subpages in Template namespace on wikimania (and for future editions)'
22:08 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '26913 - Please enable patrolled edits on the Hebrew Wikiquote (he.wikiquote.org)'
22:01 RobH: project2 (bugzilla testing) coming down for migration
21:39 logmsgbot: jeluf synchronized php-1.5/cache/interwiki.cdb 'Updating interwiki cache for new wikitech entry'
21:22 Ryan_Lane: deleting review.tesla.usability.wikimedia.org, and removing from DNS
21:17 Ryan_Lane: removing public IP from analytics.tesla.usability.wikimedia.org and removing it from dns
20:14 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '26932 - Add 'suppressredirect' right to 'eliminator' group on Portuguese Wikipedia'
18:02 RobH: srv281 repair working being done
16:38 Ryan_Lane: adding exim packages to nova-controller.tesla for mail sending from MediaWiki
January 24
23:53 Ryan_Lane: upgrading nova-* packages on nova-*.tesla
23:22 Ryan_Lane: rebooting nova-compute1/2
21:20 Ryan_Lane: rebooting nova-controller.tesla
18:56 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'Kill another Nogomatch remnant, commented-out experiment from 2005'
18:49 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Remove Nogomatch, experimental since 2005 and doesn't seem to log anywhere'
18:04 RobH: sq86 still above normal ms4 connection thresholds, leaving it at increased rate for now
17:24 mark: Applied ipv4 route-map to ipv6 AS 6908 BGP session on br1-knams, so IPv6 routes via 6908 2828 are preferred, and IPv6 connectivity between pmtpa and esams is working again
17:11 RobH: sq86 is back online with lucid, set ms4 maxcon to 800 while it catches up
16:59 mark: Rebooting spence
16:27 mark: AMS-IX migration complete, all BGP sessions back up again
16:17 mark: Shutting down all AMS-IX sessions for SFP+ ER test migration
16:01 RobH: sq86 coming down for upgrade to lucid
15:27 mark: Shutdown project1 for decommissioning
13:33 logmsgbot: demon synchronized php-1.5/wmf-config/CommonSettings.php 'Reverting codereview-proxy removal, did not work'
13:31 logmsgbot: demon synchronized php-1.5/wmf-config/CommonSettings.php 'Removing codereview-proxy step from CR'
13:05 mark: Purged payments1/payments2 resources from the exported Puppet database on db9
12:57 mark: Restarted ntpd on srv236
12:55 mark: Added puppet classes ntp::client and ganglia to snapshot1-3
12:50 mark: Unstuck APT on knsq29, and fixed puppet
12:47 mark: Fixed puppet on Kaulen
12:46 mark: Removed APT pin -10 for Wikimedia repository on Kaulen - this has been fixed
12:44 mark: Unstuck APT on knsq30, and fixed puppet
12:38 mark: Finished upgrade of Kaulen
12:28 mark: Unstuck APT on knsq8, and fixed puppet
12:12 mark: Starting distribution upgrade on Kaulen
12:03 mark: Patched Bugzilla on Kaulen
January 23
15:19 logmsgbot: jeluf synchronized php-1.5/wmf-config/CommonSettings.php '26886 - Move toggle for DoubleWiki to InitialiseSettings.php'
15:18 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '26886 - Move toggle for DoubleWiki to InitialiseSettings.php'
14:41 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php 'fix logo of ruewiki'
14:34 JeLuF: created Rusyn Wikipedia, ruewiki
14:14 logmsgbot: jeluf ran sync-common-all '25924 - Create Wikipedia in Rusyn'
13:51 JeLuF: addwiki.php broken. SQL error: "1283: Column 'si_title' cannot be part of FULLTEXT index (10.0.6.49)"
11:58 logmsgbot: hashar: I did add eowikisource yesterday. Will write a report to wikitech-l asap
11:48 logmsgbot: hashar synchronized php-1.5/wmf-config/InitialiseSettings.php '18479 - Set $wgUseDynamicDates = false for the English Wikipedia'
01:28 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '26323 - Requestin ReaderFeedback be enabled on ru.wikinews'
January 22
21:39 RobH: set sq83, sq84, and sq85 ms4 maxconn back to 100, all were well below and no longer needed increased max.
19:42 apergos: removing eowikisource, which was added to this file today but apparently doesn't exist; this breaks SUL-related things
14:19 hashar: ran rsync manually on srv287 after receiving this message from scap "rsync error: some files could not be transferred (code 23) at main.c(1385) [generator=2.6.9]"
14:10 logmsgbot: hashar ran sync-common-all 'Bug 16878 - Disable Special:CrossNamespaceLinks'
12:07 logmsgbot: hashar synchronized php-1.5/wmf-config/InitialiseSettings.php 'Bug 11776 - Showing also articles in the CategoryTree of categories in he-wiki'
11:49 logmsgbot: hashar synchronized langlist 'get ride of closed-zh-tw'
19:46 RobH: sq85 back online, frontend false, backend rebuilding, set to 800 maxconnection during rebuild
19:26 RobH: raised maxconn back to 400 on sq83, sq84, as they started to max out too often
19:18 RobH: upload services times back to normal, all upload squids online. taking down sq85 for upgrade to lucid.
19:06 RobH: reverting max_connections for sq83 & sq84 from 800 back to 100 as they have normalized well below the limit.
17:32 RobH: sq83/84 having empty caches on reinstall are maxing out ms4 connections for themselves, raising limits to 800 for them to catch up
16:53 RobH: sq85 coming down for lucid reinstallation
16:46 RobH: sq83 back in rotation
16:01 RobH: sq84 coming down for reinstallation to lucid
15:47 RobH: sq83 coming down for reinstallation to lucid
15:47 RobH: morebots died, restarted it on wikitech under my user
01:22 awjr: re-enabling automatic donation queue consumption in Hudson for queue2civicrm and recurring donation processing modules
January 20
23:42 logmsgbot: catrope synchronized php-1.5/wmf-config/ExtensionMessages.php 'Regenerated ExtensionMessages.php after removing unused extensions from extension-list and alphabetizing'
23:10 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Remove Configure code, was disabled two years ago'
22:12 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Disable AjaxTest, it's ancient and useless'
21:46 logmsgbot: demon synchronized php-1.5/wmf-config/InitialiseSettings.php 'And let bcrats grant membership'
21:46 logmsgbot: demon synchronized php-1.5/wmf-config/CommonSettings.php 'Adding new svnadmins group for repository administration.'
21:42 awjr: deploying and enabling 'recurring' module on CiviCRM
21:41 awjr: temporarily setting the 'Days before considering transaction too old to automatically thank' setting to 50 (from 14) for thank_you module in CiviCRM in order to automatically send thank-you notes for old recurring payment messages that have not yet been consumed
21:40 awjr: temporarily suspending automatic donation queue consumption (done by queue2civicrm module) in Hudson to live-test the new recurring payment handling module
16:02 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '26532 - Enable Collection extension on Estonian (et) Wikipedia.'
15:29 JeLuF: Added vhost stats.wikipedia.org on spence. Provides a redirect to stats.wikimedia.org
15:03 JeLuF: added DNS alias stats.wikipedia.org -> stats.wikimedia.org
11:30 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '16410 - Enable e-mail notification for watchlist and minor edits on portuguese Wikibooks'
10:47 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '26193 - Enable e-mail notification for watchlists on pt.wikiversity'
10:45 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '26771 - Enable e-mail notification for movement roles wiki'
10:36 logmsgbot: jeluf synchronized php-1.5/wmf-config/InitialiseSettings.php '26769 - Enable subpages on movement roles wiki'
10:18 logmsgbot: jeluf synchronized docroot/commons/favicon.ico '#23917 - Commons favicon broken on Safari & Chrome (Mac) // Also distorted on other browsers'
04:05 Tim: set up IMAP account for Mark Hershberger
03:05 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'Disable Drafts, causes issues in combination with LQT'
01:56 RoanKattouw: Restarted morebots
01:52 awjrichards: renamed fundcore_log table to queue2civicrm_log on drupal and dev_drupal databases on db9 in preparation for re-implementing fundcore module's logging feature in its replacement module, queue2civicrm
01:32 kaldari: deployed update to CentralAuth with new getCookieDomain method
00:37 robla: added commit access for appy (Apekshit Sharma)
00:19 Tim: removed pilot.wikimedia.org, was an old insecure MediaWiki instance
January 19
23:26 Ryan_Lane: added agarrett to engineering alias
21:45 Ryan_Lane: upgrading nova-* packages on nova-*.tesla
21:41 logmsgbot: hashar synchronized php-1.5/wmf-config/InitialiseSettings.php 'Bug 26787 - English Wikisource: Have Template: namespace converted to be subpages'
20:47 logmsgbot: hashar synchronized php-1.5/wmf-config/InitialiseSettings.php 'Bug 26126 - New "Collection" namespace for en.wikiversity.org'
20:25 logmsgbot: hashar synchronized php-1.5/wmf-config/InitialiseSettings.php 'Bug 26565 - Please enable upload on Persian Wikinews'
20:15 logmsgbot: hashar synchronized php-1.5/wmf-config/InitialiseSettings.php 'Bug 26572 - Enable subpages in main namespace on svwikiversity'
20:09 logmsgbot: kate synchronized php-1.5/wmf-config/db.php 'put db34 back'
19:44 hashar: frwiktionary WT was a namespace alias for NS_CATEGORY from 19:01 to 19:03 roughly. Now NS_PROJECT. Some page might show unwanted categories, issue can be solved by purging the page
19:03 logmsgbot: hashar synchronized php-1.5/wmf-config/InitialiseSettings.php 'Bug 26761 - was NS_PROJECT not 14!'
19:01 logmsgbot: hashar synchronized php-1.5/wmf-config/InitialiseSettings.php 'Bug 26761 - New namespace alias WT for project namespace on frwiktionary'
18:19 logmsgbot: hashar synchronized php-1.5/wmf-config/InitialiseSettings.php 'sync comments enabling vim folding (no code change)'
18:03 hcatlin: Mobile deploy complete and stable
18:02 RoanKattouw: Ganglia's fine
18:00 hcatlin: Ganglia seems to be down for me. "fsockopen error: Connection refused"
17:57 hcatlin: deploying mobile updates
17:44 logmsgbot: kate synchronized php-1.5/wmf-config/db.php 'remove db34 to dump s3'
17:38 logmsgbot: kate synchronized php-1.5/wmf-config/db.php 'put db16 back'
14:29 logmsgbot: kate synchronized php-1.5/wmf-config/db.php 'remove db16 to dump s7'
01:24 Ryan_Lane: fixing torrus
January 18
23:58 Ryan_Lane: upping CARP weight of sq79 to 30
23:50 Ryan_Lane: those last two sq49 logs were really sq79
23:50 Ryan_Lane: upping CARP weight for sq80-82 to 30, upping sq49 to 20
23:45 Ryan_Lane: upping CARP weight for sq80-82 to 20, upping sq49 to 15
23:28 Ryan_Lane: upping CARP weight for sq80-82 to 15
23:27 richcole: db 15 reboot
23:10 RobH: db25 & db27 shutdown
23:07 Ryan_Lane: upping CARP weight on sq79-82 to 10
22:40 mark: Increased max-conns to ms4 on sq80 from 100 to 400
19:57 logmsgbot: hashar synchronized php-1.5/wmf-config/InitialiseSettings.php '26633 - Give commonswiki group 'Image-reviewer' right to Add groups: Image-reviewer'
11:40 logmsgbot: hashar synchronized closed.dblist 'bug 26569 - Close usability.wikimedia.org to editing'
11:11 logmsgbot: hashar synchronized php-1.5/wmf-config/InitialiseSettings.php '26643 - Allow sysops to add/remove a bunch of groups on siwiki'
10:15 Tim-away: removed /tmp/gs_* on srv219-224, root partition was full
January 15
01:53 apergos: starting en wikipedia xml dump, one phase at a time. from screen session on snapshot3, this will be a parallelized run
January 14
23:21 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'Set $wgLogo to $stdlogo on cswiki'
23:15 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'Set enwiki logo back to $stdlogo so local sysops can set up a special logo for WP day'
20:05 logmsgbot: hashar synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 26634 - allow bureaucrats to add/remove for group abusefilter'
15:03 logmsgbot: hashar synchronized php-1.5/wmf-config/InitialiseSettings.php '26634 - Activation of AbuseFilter on Romanian language Wikipedia'
15:02 logmsgbot: hashar synchronized php-1.5/wmf-config/abusefilter.php '26634 - Activation of AbuseFilter on Romanian language Wikipedia'
14:55 RobH: rebooting dataset1, it was left in webbios and thus not controllable by serial
14:24 logmsgbot: hashar synchronized php-1.5/wmf-config/InitialiseSettings.php '26361 - wgLocaltimezone for koquote'
06:53 Tim: on all squids, turning off debug output on 44,1, which was filling up cache.log with "Failed to select source" warnings when the backend was down
06:42 Tim: backend squid on sq41-48 all crashed at once, because their root partitions filled up some time ago, and then something HUP'd them all at once. Deleted cache.log and restarted squid.
03:45 Tim: commented out the $wgCentralAuthCookieDomain = '' in CentralNotice.php
17:48 RobH: all text squids in pmtpa/sdtpa have been upgraded to lucid
17:48 RobH: sq75-sq78 back in service, lucid
17:45 logmsgbot: mark synchronized php-1.5/wmf-config/db.php 'Remove db15 from rotation'
17:22 RobH: sq75-sq78 coming down for reinstall to lucid
17:19 RobH: sq71-sq74 reinstalled to lucid and pushed back into service
16:39 mark_: Increased CARP weight of amssq31 to 30
16:33 mark_: Powercycled amssq38
16:30 RobH: sq71-sq74 coming down for reinstallation to lucid
16:27 RobH: sq65-sq66 reinstalled to lucid and online
16:20 mark_: Lowered CARP weight of amssq* non-SSD text squids from 10 to 8 to relieve disk and memory pressure
16:20 apergos: running cleanup on index.html and md5sums for the XML dump jobs that recorded bogus "recombine" jobs that weren't actually run
16:18 mark_: Stopped old puppet instance and restarted squid-frontend on amssq37
16:16 RobH: amsq31 upgraded to lucid and back in service
16:05 RobH: both srv182 and srv183 were not responsive to serial console, rebooted both.
15:42 RobH: sq65 & sq66 coming down for reinstallation to lucid
15:40 RobH: sq62-sq64 installed and online lucid
15:31 RobH: reinstalling amssq31
15:11 RobH: sq62-sq64 down for reinstall
15:06 RobH: sq59-sq61 online as lucid hosts
14:59 RobH: updated wmf repo, copying package for squid from karmic to lucid as well (seems to have been lost from other changes, as frontend was still there)
14:02 RobH: sq59-sq61 offline for reinstall
12:03 mark_: Enabled multicast snooping on csw1-sdtpa
January 11
23:39 Ryan_Lane: adding cron on nova-controller.tesla to svn up /wiki hourly
22:27 RobH: have to reinstall the squids, wrong version written into lease
22:24 awjr: archived hudson build files for 'donations queue consume' and emptied builds directory on grosley to allow the jobs to continue running (donations queue consume job was failing due to hitting max # of files in a dir for ext3 fs)
22:05 RobH: correction, sq65, sq66, & sq71
22:04 RobH: sq62-sq64 back online, sq65-sq67 coming down for reinstall
21:48 Ryan_Lane: adding exim::simple-mail-sender to owa1-3
21:33 RobH: sq59-sq61 in service, sq62-sq64 reinstalling
21:27 Ryan_Lane: powercycling amssq31
20:52 RobH: sq59-61 reinstalled and online, pooled, partially up
20:42 rainman-sr: enwiki.spell index somehow got corrupt, investigating and rebuilding it now on searchidx1
18:57 RobH: sq59-sq61 coming back offline, bad partitioning in automatied install, need to update squid configuration for these hosts
18:47 RobH: sq59 having reinstall issues, skipping it and moving on
18:36 RobH: sq61 reinstalled and back online
18:30 RobH: sq60 reinstalled, back in service
18:03 RobH: sq59-sq61 depooled for upgrade
16:15 RobH: sync-docroot run to push updated tenwiki favicon
14:27 mark_: Depooled amssq31 and amssq32 for SSD install
January 10
21:01 Ryan_Lane: patching python-nova on nova-compute*.tesla (see bug #lp700015)
20:59 Ryan_Lane: err make that #lp681164 for ldap driver
20:59 Ryan_Lane: patching ldap driver on nova-* (see bug #lp681030)
20:57 Ryan_Lane: patching ec2 api on nova-controller.tesla (see bug #lp701216)
19:56 RobH: updated dns with virt!-virt4 mgmt info
16:10 RobH: singer was crashed, investigating why it suddenly had issues. It was pulling down db9, but it halted before it could damage anything.
16:04 RobH: secure server, as well as blogs, offline, investigating server issue on singer
16:02 RobH: db9 having issues and singer is as well, taking singer down since its already crashed
January 9
22:04 Ryan_Lane: powercycling srv217
22:03 Ryan_Lane: powercycling srv271
22:01 Ryan_Lane: powercycling srv262
19:21 apergos: rebooting amssq61 hoping to clear up its problem. I guess puppet restarts squid instances every so often *sigh*
18:50 apergos: stopping squid front and back end instances on amssq61, has network issue
14:15 mark: Reduced Varnish max worker threads from 8000 to 2000 per threadpool
13:52 mark: Pooled knsq6 and knsq7 as bits.esams
13:43 mark: Converted knsq6 and knsq7 into bits.esams machines
13:16 mark: Stopping squid instances on knsq6 and knsq7
13:16 mark: Depooled knsq6 and knsq7 from the knams.text pool
January 8
16:14 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Bump style version appendix'
16:53 logmsgbot: catrope synchronized php-1.5/wmf-config/CommonSettings.php 'Remove hack disallowing anon edits on tenwiki - moved to its proper place in InitialiseSettings.php'
16:52 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'Disallow anon edits on tenwiki'
16:50 logmsgbot: catrope synchronized php-1.5/wmf-config/InitialiseSettings.php 'bug 26554 - Set $wgAutoConfirmAge to 0 on tenwiki'
16:43 Ryan_Lane: removing login.tesla from DNS
15:42 apergos: snapshot1 removed from mediawiki-installation for now til set up again
15:35 Ryan_Lane: dropping nova database on nova-controller.tesla, letting nova recreate it
15:29 Ryan_Lane: changing ldap schema on nova-controller.tesla to newer (v2) openstack schema