Remove a message from mailing list archive

From Wikitech

This page is about the removal of a message sent to a Wikimedia Foundation mailing list. Please read all this before doing anything. Talk to experienced mailman admins / moderators!

Current situation

Sombody sent a message to the list - they now want it removed from WMF's archive.

Facts:

  • the message has already been sent to all the list members - so it is public.
  • It has been archived in some public archives and possibly private archives.

Removing a message from the archives is extremely painful (for both the sysadmins and the users, because it requires a complete halt of all lists for a while) and usually useless: please consider if it's really worth the effort. Sysadmins will remove messages only for serious privacy reasons; the current policy is that all requests are rejected (unless there is a court order or equivalent legal obligation, of course).

Public lists are usually mirrored by several services, so you'll need to delete the private data there before asking removal on WMF archives, which would otherwise make no sense. This is probably not possible at least for bigger lists, which have more mirrors.

If the list is private, the removal will affect only future subscribers, not current ones (who have already received the email), so the request will usually be considered only if the list is big (i.e. many new subscribers are expected).

Obviously, don't publicly link to the content to be removed. It's currently unclear where such requests should be filed, who should decide to accept them and who should perform the removal. You can try and ask on the #wikimedia-tech IRC channel.

How-to

Private mail

Too late. By the time someone's begging you for help, the offending message was already sent to the list subscribers (probably hundreds or thousands of email addresses). Drawing attention to the information will only worsen the problem.... see Streisand effect.

Mailman archives

A message can be removed from mailman's own list archives, which are generally made available for public viewing via the web.

Note: we don't know why (Mailman/Archiver/pipermail.py seems rather predictable), but the numbering of the posts in archives is not deterministic and all posts will be renumbered anyway even if you follow what below, in other words links to archives will all be broken by this operation. Don't EVER do it unless you're absolutely forced to; see above.

  1. Locate the message in the web archives and load it up for reference.
  2. Stop the mailman runner so the mailbox won't be touched while you're working
    /etc/init.d/mailman stop
  3. Back up the list archives for the list you're editing:
    cd /var/lib/mailman/archives/private
    mkdir wikifoo-l.backup
    cp -pr wikifoo-l wikifoo-l.mbox wikifoo-l.backup/
  4. Edit the mbox file in vi or something...
    vi wikifoo-l.mbox/wikifoo-l.mbox
  5. Find the message you want and replace only the offending content in it with a "this message deleted" message.
    Do not remove the entire message, as this will cause URLs to later messages in the archive to break! [1] [2]
  6. Check what is the number of the earliest message archived by pipermail: by default, it's 000000 i.e. 0. If it is different, you have to ensure the new archives' numbering start from the same first number... HOW?? Maybe with dummy messages like this: [3].
  7. Rebuild the list archives:
    /usr/lib/mailman/bin/arch --wipe wikifoo-l
    • Note: this step may fail if the mbox file's content is invalid for the now-stricter recent versions of mailman, for instance invalid date formats or characters.[4] [5]
    • DANGER DANGER: It is quite possible that the above command will cause pipermail numbers to be different and thus breaking all links. To avoid that it might be best to only edit the html file of the archive in place and be done with it and never mess with the arch command.
  8. Reload that message from the web archives and make sure it looks ok.
  9. Start the mailman runner:
    /etc/init.d/mailman start
  10. Leave the backup there for diagnosis purposes in case something has broken and you haven't noticed.
  • If you «Check the .mbox file with bin/cleanarch to make sure there are no unescaped From_ lines in message bodies» [6] beware that «this may renumber messages and break saved links to archived messages» [7], but you can «escape those lines and preserve archive order [by] escaping the line and adding an empty "From " immediately after the containing message».[8]

Gmane

There is some sort of procedure for this. With luck, the petitioner will e-mail the gmane people themselves and you don't have to worry about it.

Other archives

There may be other public archives of the lists built from addresses subscribed to the lists. There is no automated way to deal with this.