Ops notifications

From Wikitech

The operations team provides notification of maintenance or upgrades being done, emergency outages, and day to day work. Below is a summary of the different means in use by ops.

IRC

Primary channel: #wikimedia-operations at irc.freenode.net

Use: day to day work and discussions go on here. Updates to the sysadmin log are done from here.

Secondary channel: #wikimedia-tech at irc.freenode.net

Use: during an outage, someone is usually monitoring and providing updates to this channel, while watching for reports of problems.

Other:

For sensitive matters that involve possible security or privacy issues, discussion is carried out elsewhere, for obvious reasons.

Server Admin Log (SAL)

Server Admin Log

Use: powercycling machines, restart of services, upgrades or configuration changes etc. are logged here. During long outages brief status updates will be provided here as well.

Twitter

Everything posted to the Server Admin Log is also posted to twitter [1], account @wikimediatech.

Wiki

Outage post-mortems are published on-wiki as incident reports.

Phabricator

When bugs involving operational issues are opened, followup will be done here (publicly available [2])

Email

There is no single public email mechanism currently in use for updates during site outages or after recovery. (Should there be, or do the above cover us?)

Gerrit

You can look at Gerrit changes and see what has been merged in ops-related projects, f.e. operations/puppet and other operations/* repositories.