Help:Tool Labs/Backups

From Wikitech

Users' and service groups' homes get backed up automatically on a weekly schedule, and the resulting backups are made available at /public/backups/username/project/. By default, this backup is entirely automated, but what is backed up can be configured by the user (see section below).

Backups are either tarballs (for files and directories) or SQL dumps (for databases), compressed with bzip2 and are named:

YYYYMMDD-name.tar.bz2 or
YYYYMMDD-name.sql.bz2

where YYYYMMDD is the date the backup was taken on, and name is a unique identifier derived from the path or database names.

There is no guarantee about when a backup takes place except that it will be taken no less than once a week.

What gets backed up?

By default, the contents of the home gets backed up, recursively, except:

  • Log files (ending in .log, .out and .err)
  • Git repos (any directory containing a .git subdirectory) will be skipped.
  • Any directory containing a file names NOBACKUP (regardless of contents).

No databases get backed up by default.

Configuring backups

Users can place a .backuprc in their homes, which overrides the default (whole home) backup. This is a text file, containing one directive per line. The following kinds of directives are understood:

cluster:dbname
Requests that the database dbname be dumped and backed up. The cluster is the same name you use to connect to the database, such as "tools.labsdb" or "enwiki.labsdb" (without the quotes).
-pattern
Requests that files matching pattern be excluded from backups. This may be specific filenames, (such as "directory/somefilename") or glob patterns (such as "*.o").
path
Lines that do not contain a colon nor start with a dash are interpreted as file and directory names to be backed up. If they do not start with a forward slash ("/") they are considered relative to the user's home. Note that files outside the shared NFS directories (/home and /data/project) cannot be backed up even if explicitly listed.

Any errors in your .backuprc will be reported in a file named backup.err upon the next backup attempt.

For instance, the following .backuprc:

tools.labsdb:s12345__foo
public_html
work
-work/tmp
-*.o

Would back up the database "s12345__foo" from the tools.labsdb cluster, as well as the ~/public_html and ~/work directories (but not ~/work/tmp, if present) from the user's home (provided, of course, that the user can actually read those directories and database). In addition, no object files (matching *.o) will be backed up, regardless of where they are.

How long are backups kept?

By default, the backup system maintains at least the latest two completed weekly backups of the current month, and the last weekly backup of the previous three months. There is, however, a soft 10GB limit per user of backup (compressed) data. If the sum of the latest backups plus the older ones is above that limit, the system will first try to discard older weekly backups within the month, then older monthly backups. In no case will the latest weekly backup or the latest previous month's backup be deleted.