Home : Linux resources : "Howto" : Backup
After describing the difference between archival and mirroring backups and some common backup tools used for each by Unix system administrators, I describe the general considerations that go into setting up an archival backup system, and describe the tools I use for archival backups. I conclude with a briefer section of notes on mirroring backup -- briefer both because the technology is simpler, and because I use it less.
The baseline motivation behind all backup systems is disaster recovery: You want to ensure that your files will survive all hardware failures that Murphy's Law might conceivably throw at you. All backup technologies meet this goal by making a copy, but there are really two kinds of copies, with distinct recovery characteristics: Archival, and mirroring.
Archival backup gives you the ability to travel through time: If you suddenly realize that an important file is missing, and you're not sure when it was deleted, then the ability to sift through a year of backup dumps looking for the missing file can be a life-saver. In order to do this, however, you must keep a lot of data around, and that almost always means putting the backup dumps on some sort of offline storage.
Mirroring backup gives you immediate access to the most recent copy of your data; if you deleted that important file just this morning, then it's a snap to go get it from the backup drive, without any searching. On the other hand, if you deleted it before the last mirroring operation, you are completely out of luck. At a minimum, mirroring only requires a spare disk of comparable size, and is easy to automate completely, as it requires no manipulation of offline media.
The "entry-level" backup options for Linux (and Unix systems generally) tend to provide either archival or mirroring, but not both. They are:
Fortunately, it is possible to have both archival and mirroring backup, for those that need it. For small to medium installations where high availability is important, you can install a hybrid system where archival dumps are created on a primary server, copied to a backup server for safe-keeping, and also restored onto the backup server's disks for quick access in the event that the primary server fails.
And for many small installations, archival backups are sufficient. This is all I need at home, in fact.
It is also possible to do mirroring without archival, though I myself would not recommend it. But the low-maintenance of an rsync solution may make it the most appealing for some -- just be clear that you're giving up your "data history" when you pass on archival.
In order to address these drawbacks, it is useful to define a backup level between 0 and 9 that controls how comprehensive to make the backup. Each level k backup contains a snapshot of all files changed since the level k-1 dump (or the dump made at the next lower numeric level if there is no level k-1 dump). Level 0 is therefore the most comprehensive, and level 9 is the most "incremental." At this point, some additional terminology is in order:
In order to reduce the number of incrementals required, one can use the "modified Tower of Hanoi algorithm" described in the dump manpage, which prescribes the following sequence of incremental dump levels (after having made a full or consolidated dump):
3 2 5 4 7 6 9 8 9 9 ...These are for daily backups, which is the absolute minimum period for a workgroup server in an office environment. At the end of the week, a consolidated dump is performed, and the daily cycle starts over again. At this point, last week's incrementals could be thrown away, as they are no longer needed for disaster recovery, but it's a good idea to keep them around for at least a month in order to cover the "I didn't mean to delete that" syndrome.
In any case, this multilevel backup system turns out to be quite
effective in reducing the size of backups; even after a month, a
consolidated dump can be only about 20% of the size of the full dump,
and the daily incrementals only 3 to 5%.
For less automated systems, the cost may be 5 to 10 minutes for each
backup dump. A system failure that requires restoring from backups
could happen at any time during the backup cycle, which means that the
expected amount of work lost for each failure is half of the usage
between backup intervals. In other words, if the system is backed up
after every 40 hours of use, then the expected loss due to backup
failure is 20 hours. It seems reasonable to set the expected loss over
the course of a year equal to the planned time investment, and then
solve for the backup frequency in order to find a value that minimizes
expected total effort. (Finding the true optimum probably isn't much
harder, but it's not clear that it's worth the effort.) If we do that,
we get:
Of course there are other costs to consider, such as inconvenience to
customers (and staff embarassment) when you have to admit that you lost
their emails, but these mostly define a "maximum acceptable loss"
ceiling, underneath which it is still desirable to seek an optimum.
If there is only one user who uses the system for 40 hours per week,
and who does their own backups, then we have what might be called the
"standard home office scenario." For this scenario, and assuming that
(a) backups take 10 minutes on average, and (b) the system is likely to
fail once per year on average (which might or might not be pessimistic),
then we arrive at the following optimal backup frequency for the home
office:
This works out to be three times every two weeks, for a total time
investment (or expected time lost due to data recovery) of
77.5*10 = 775 minutes, or about 13 hours. We might
want to round this frequency to twice per week, then the time investment
is 1000 minutes (almost 17 hours!), and the expected time lost is only
10 hours (a quarter of a week).
Most changes to this minimal scenario have the effect of driving the
ideal backup frequency up. If there were ten people using the system
via file sharing, then the amount of potential lost work is ten times
higher, and so it becomes worth investing that 10 minutes every working
day (the actual optimal frequency is nearly 245 backups per year). If
the time of the person making backups is only worth half as much as that
of the average file server user (in which case we should optimize the
dollar cost), then the "daily is optimal" point would be reached with
only 4 or 5 additional server users. The end result is that it rarely
makes sense for small offices with shared file servers to do backups any
less often than daily. If the resulting 41 hours per annum of staff
time spent on backups becomes excessive, then it's time to increase the
level of backup automation.
Backup timing is also important, though often overlooked. If the backup
system makes its copy of a given file while an application is partway
through updating it, the copy that winds up on the backup medium may be
inconsistent, and would appear to be corrupted to the application if it
were ever restored. For this reason, it is best to make backups at
times when the file system isn't changing. The middle of the night is
therefore ideal.
Another solution to the "changing data during dump" problem is to
remount the filesystem read-only before performing the backup. This has
never been practical for me; if the partition is exported via NFS (true
for all partitions I need to back up), I would need to unmount it on all
clients, possibly disrupting shell sessions or other long-running
processes. The closest I've come is to edit /etc/fstab to mark
the partition as read-only temporarily and then reboot, but that doesn't
work for automated nightly backups, so I've only done it when making
extra just-in-case backups before server upgrades, when I am planning to
reboot the system anyway.
A particularly nasty case of backup-induced corruption can be caused
by backing up the files used by a relational database management system
(RDBMS) to implement tables. A transaction that updates multiple tables
may be in different stages of being written to disk for each table, so
the backup might be inconsistent even if it could be done
instantaneously. There are really only two choices for archival backup
of a database: Stop the RDBMS server completely (e.g. "systemctl
stop mariadb") during the backup, or use a database client backup
program (e.g. mysqldump for the MariaDB system). Doing the latter is
more robust, since it makes it more likely that old database content can
be restored into a much later version of the RDBMS system.
For similar reasons, backing up more than once a day is probably not
worth the bother. The only predictable period during the day when the
file system is highly unlikely to change is during the night when all
users are asleep. And, for just those reasons, doing more than one
backup during this period would be pointless.
A regular weekly schedule is easy to automate via cron jobs. The
crontab entries for the full schedule for my /home
partition look like this:
When I first automated this process, I tried doing them daily, but
that got to be too much work, because I didn't change that many files,
and I still had to copy the backups to offline storage manually. Making
the backup was no help from the point of view of disaster recovery if I
didn't copy it to another disk fairly promptly. Consequently, I only
did the level 1, 2, 4, and 6 dumps in the crontab schedule
above. Then I got a new desktop machine and set the old one up as a
server, which made it possible to copy all dumps automatically from the
server to the disk on the new machine.
The traditional Unix tar program is not well suited to
making backups because it is only capable of creating full dumps.
However, the GNU
tar program can do incremental backup dumps, and it's
probably already installed on your GNU/Linux system, so it's worth
mentioning along with the other possiblities.
To keep track of what's been dumped so far, tar uses what it
calls a "snapshot file," which is passed to the
--listed-incremental option. Every time tar is run
with this option, it consults this file for the state of the filesystem
at the last backup, and updates it with files that it writes to the new
tarfile(s). If a file is named that does not exist, then it is created,
and the resulting tarfile contains a full dump. To keep the original
snapshot file from becoming modified (which makes subsequent
consolidated dumps impossible), the original snapshot must be copied for
each subsequent backup, so that each tarfile backup gets its own
snapshot, at least until it is superceded by a subsequent backup.
The resulting backup protocol looks like this:
And so on through the rest of the week.
[I am considering extending the backup.pl Perl script described below
to support GNU tar backups. However, because GNU tar
requires keeping track of the additional snapshot file, it would require
extensive changes to the backup code infrastructure, and seems harder to
automate, so it's not clear that it's worth it. -- rgr, 26-Jan-21.]
Historically, I used the standard, tried-and-true dump and
restore programs. More recently I have used dar
for backups; it has some advantages and disadvantages over
dump; see below for details.
An interesting characteristic of dump is that it accesses
the raw device file (i.e. /dev/hda5) of ext2,
ext3, and ext4 file systems, instead of going through
the file system interface, which means that it can only work on whole
partitions, and it can miss file data that
is cached in in RAM and not yet written to disk.
The pros and cons of this approach are discussed on the "Is dump really
deprecated?" page of the Dump/restore utilities project.
In a nutshell, it makes the "changing data during dump" problem worse,
though there are ways around this, but has the unique advantage that
partitions can be dumped without affecting any of the times recorded by
the file system. And (as also mentioned in the "Backup timing" section) data that is changing during
the backup is only one of the tradeoffs you need to consider when
setting up a backup system.
Note that the backup.pl Perl
script described below used to support dump to create
backup files, but I dropped that support in 2017. -- rgr,
25-Jan-21.]
To see how many bytes are likely to be written to a dump file, use
the "-S" option to dump, e.g.
for a level 2 dump of the /dev/hda9 partition.
Unfortunately, there are also a few disadvantages:
All in all, I find the drawbacks minor, and have come to prefer
dar; it has been my standard backup tool since 2008.
To assist in setting up a backup system, I have written a series of
Perl scripts that help to automate the tedious parts. These are part of
the "scripts" project at
Github and are available under an open-source license.
With these tools, it is possible to automate the system backups of a
small to medium office site (up to 20 daily users) to a high degree. A
cron job runs backup.pl to create backup files on the
primary server in a scratch (non-backed-up) partition which are then
copied by vacuum.pl in a second cron job to another
scratch partition on secondary system for safe keeping. Once on the
secondary server, one can use cd-dump.pl to write the full and
consolidated dumps to offline media without loading the primary server.
Other cron jobs on each system can then run
clean-backups.pl periodically to remove the oldest daily
backups in order to preserve sufficient room for new backups. Depending
on the amount of scratch disk space available and the volume of daily
file system churn, the sysadmin only needs to intervene a few times per
month to write offline media and to remove excess full and consolidated
dumps.
These tools use a backup file naming convention in order to make it
easier to keep track of what may amount to thousands of backup files
collected from multiple systems over many years. All such files match
"<prefix>-<date>-l<level>.<slice>.dar"
for dar backups or
"<prefix>-<date>-l<level><idx>?.dump"
for dump backups, where
To install "scripts" backup tools,
By default, this will put the above scripts into
/usr/local/bin (along with backup-dbs.pl and
svn-dump.pl, which are not discussed here, as they are more
specialized), as well as intalling the classes they use where Perl can
find them.
When backup.pl is run by root, it creates and verifies a set
of backup files using the dar
program. Usage is
See the documentation in the script for argument descriptions, known
bugs, and other details.
The show-backups.pl script lists all backup files it can
find under the search root(s) that follow the naming convention described above.
The default search roots are any directories that match
"/scratch*/backups" but that can be overridden by specifying
directories on the command line. Other options exist to constrain the
search by level, date, and prefix, and to modify the output format.
Usage is as follows:
Here is an example of the output:
Note that the current backup files (the ones that would need to be
restored in order to recreate the most recent state of the filesystem)
are marked with a "*".
vacuum.pl copies backup dump files from place to place,
being careful to copy only current backups, and checks for good copies
to guard against network corruption. Usage for this is
See the documentation in the script for argument descriptions, known
bugs, and other details.
When following the modified Tower of Hanoi
backup level scheme described above, daily backups (those with a
backup level of 2 or greater) contain only one or two days worth of
data. Odd dailies (3, 5, 7, and 9) have only one day, and even dailies
(2, 4, 6, and 8) have two days -- that day plus that of the previous odd
daily. Consequently, it is less important to keep dailies around for
extended periods of time, so they can be deleted automatically when they
are no longer useful. This is what clean-backups.pl does:
Maintains a specified minimum amount of free space on the filesystem
partition that is used for backups by removing first odd dailies that
are older than a specified threshold, starting with the oldest, and if
that does not restore enough space, it then removes even dailies. No
output is generated except when clean-backups.pl fails to make
its quota, which makes it work well as a cron job; the sysadmin
gets an email when it's time to think about removing consolidated dumps.
Full and consolidated backups cover much longer time periods, and are
usually kept around for much longer. For this reason,
clean-backups.pl never deletes full or consolidated backups.
Usage is as follows:
The default configuration file is /etc/backup.conf, which
tells which partitions to clean, and how thoroughly to clean them. Here
is an example:
Notice that the same configuration file is shared between two
systems. Minimum retention is specified as a number of days, and the
even and odd retention values are specified separately. The
min-free-space is specified in GiB, and the clean
tells which prefixes should be cleaned. in the event that multiple
backup sets are stored on that partition.
The weakness of mirroring backup is that it only gives you a single
archival time point from which to recover. Of course, this assumes that
you only make a single mirrored copy; multiple copies could get quite
expensive, so it's not surprising that I've never heard of anyone who
has actually done multiple mirrored copies, except possibly for Web
content.
The key parameter for mirroring backup is therefore the backup
frequency, which involves a tradeoff in the two different kinds of
recovery capability discussed above. If you
back up more frequently, then you will lose less in the event of a
catastrophic failure (i.e. a disk crash), but you will also have less
time in which to recover from file corruption or accidental deletion.
The extremum of frequent backup is provided by
RAID 0, in which backup
is transparent and so frequent as to be effectively instantaneous, and
recovery from single-disk failure is likewise transparent, but there is
no archival history whatsoever. Having RAID is not the same as having a
backup!
Another example of continuous mirroring backup is
database replication. In master-slave setup, a master database
server pushes changes to one or more slave servers; each server keeps a
copy of the data on its local disk(s), so that if the master server
fails, any of the slaves can be reconfigured to take over as the
replacement master. However, the same caution applies: Just because
your database is replicated doesn't mean that it's backed up!
This Web server uses rsync to mirror the CMU Common Lisp download content
from common-lisp.net. The server
runs the following cron job once a day as root:
The -a switch requests archival copying; according to the
manual page, the -a option "... is a quick way of saying you
want recursion and want to preserve almost everything." The -v
switch makes it verbose (which is low-cost, as the content has few but
very large files, and doesn't change often), and -z means to
use compression in transit. This command contacts the
common-lisp.net rsync server and updates the contents
of the cmucl tree under /scratch/mirror/cmucl/ on my
server -- without bothering to copy anything that hasn't been changed.
You can browse the content at
http://www.rgrjr.com/cmucl/downloads/.
rsync can also be used for disk-to-disk copying within a
single system. Here is how Anthony DiSante describes his backup system,
in which he uses rsync in lieu of archival backup:
Note that a system-to-system backup of this magnitude might not take
much longer; probably not much of that 172GB changes from week to week,
so rsync would figure that out and would only transfer the
differences. Based on my experience, a full dump of 172GB
(uncompressed) would require 11 hours to transmit over a local 100BaseT
connection, so dealing with archival dumps of this size would be a pain.
Also, since the backup drive is removed after update, this setup can
be extended to use two or more identically-configured external drives,
which are updated in rotation. This requires no more effort than for a
single drive, but begins to provide some archival history, for those who
can afford the additional hardware.
Backup frequency
Deciding how often to make backups requires making a tradeoff between
how many days of work you are willing to lose versus how much effort you
have to spend on performing each backup. That is why a high degree of
automation is a great advantange; it costs essentially nothing to take
backups every day. My automated system costs me only 5 to 10 minutes
per week, mostly to write consolidated backups to CD, and changing the
daily backup schedule wouldn't affect that at all.
I*f = W*F/(2*f)
where
f2 = W*F/(2*I)
f = sqrt(W*F/(2*I))
fopt = sqrt((120000 min/yr*1 failure/yr)/(2*10min))
= sqrt(6000) = 77.5 per yr
Backup timing
Automated backups with cron
# At 03:00 every night, do a /home backup.
00 03 * * Mon /usr/local/bin/home-backups /dev/mapper/boot-home 3
00 03 * * Tue /usr/local/bin/home-backups /dev/mapper/boot-home 2
00 03 * * Wed /usr/local/bin/home-backups /dev/mapper/boot-home 5
00 03 * * Thu /usr/local/bin/home-backups /dev/mapper/boot-home 4
00 03 * * Fri /usr/local/bin/home-backups /dev/mapper/boot-home 7
00 03 * * Sat /usr/local/bin/home-backups /dev/mapper/boot-home 6
00 03 * * Sun /usr/local/bin/home-backups /dev/mapper/boot-home 1
# [full backup recipe. -- rgr, 10-Apr-04.]
# 00 01 * * Mon /usr/local/bin/home-backups /dev/mapper/boot-home 0
Tools for archival backup dumps
Archival backups with tar
# cd /home
# tar --create --xz --file=/scratch/backups/home-20210103-l0.tar.xz \
--listed-incremental=/scratch/backups/home-20210103-l0.snap .
# tar --diff --file=/scratch/backups/home-20210103-l0.tar.xz
#
This will create both files in /scratch/backups/.
Adding the --xz option requests xz compression,
so we've added a suffix to match.
# cd /scratch/backups
# cp home-20210103-l0.snap home-20210104-l3.snap
# cd /home
# tar --create --file=/scratch/backups/home-20210104-l3.tar \
--listed-incremental=/scratch/backups/home-20210104-l3.snap .
# tar --diff --file=/scratch/backups/home-20210104-l3.tar
#
After tar --create, home-20210104-l3.snap is
updated with filesystem changes that tar found and wrote
to the new backup file. (We've omitted the compression here
because it's less necessary to compress the dailies; they are
usually much smaller and are don't hang around as long.)
# cd /scratch/backups
# cp home-20210103-l0.snap home-20210105-l2.snap
# cd /home
# tar --create --file=/scratch/backups/home-20210105-l2.tar \
--listed-incremental=/scratch/backups/home-20210105-l2.snap .
# tar --diff --file=/scratch/backups/home-20210105-l2.tar
#
# cd /scratch/backups
# cp home-20210103-l0.snap home-20210110-l1.snap
# rm -f home-20210103-l[234567].snap
# cd /home
# tar --create --xz --file=/scratch/backups/home-20210110-l1.tar.xz \
--listed-incremental=/scratch/backups/home-20210110-l1.snap .
# tar --diff --file=/scratch/backups/home-20210110-l1.tar.xz
#
We have gone back to home-20210103-l0.snap to get our
consolidated dump, and we can get rid of the snapshots for the
dailies, since we won't be making incrementals from them -- and
likewise for previous consolidated dump snapshots, if there had
been any.
Archival backups with dump
dump -S2 /dev/hda9
Archival backups with dar
"dar" stands for Disk ARchive,
and has a number of advantages with respect to dump:
Backup tools in the "scripts" project
# git clone https://github.com/rgrjr/scripts
# cd scripts
# make install-backup
...
#
The backup.pl Perl script
backup.pl [ --test ] [ --verbose ] [ --usage|-? ] [ --help ]
[ --date=<string> ] [ --name-prefix=<string> ]
[ --file-name=<name> ]
[ --dump-program=<dump-prog> ] [ --[no]dar ]
[ --gzip | -z ] [ --bzip2 | -y ] [ --compression[=[algo:]level] ]
[ --dest-dir=<destination-dir> ] [ --dump-dir=<dest-dir> ]
[ --volsize=<max-vol-size> ]
[ --target=<dir> | <dir> ] [ --level=<digit> | <level> ]
Listing backups with show-backups.pl
show-backups.pl [ --help ] [ --man ] [ --usage ] [ --prefix=<pattern> ... ]
[ --[no]slices ] [ --[no]date | --sort=(date|prefix|dvd) ]
[ --before=<date> ] [ --since=<date> ] [ --size-by-date ]
[ --level=<level> | --level=<min>:<max> ]
[ <search-root> ... ]
where:
Parameter Name Deflt Explanation
--before If specified, only dumps on or before this date.
--help Print detailed help.
--level all If specified, only do dumps in this range.
--man Print man page.
--prefix Partition prefix on files; may be repeated.
--since If specified, only do dumps since this date.
--size-by-date no Print a table of total size by dump date.
--slices If specified, print only slice file names.
--sort prefix Sort by prefix, date, or dvd order.
--usage Print this synopsis.
# show-backups.pl --since 2021-1-1
* 56890485 home-20210127-l5.1.dar [orion:/scratch/backups/]
* 64615879 home-20210126-l2.1.dar [orion:/scratch/backups/]
57317283 home-20210125-l3.1.dar [orion:/scratch/backups/]
* 106608760 home-20210124-l1.1.dar [orion:/scratch/backups/]
77181762 home-20210123-l6.1.dar [orion:/scratch/backups/]
69923104 home-20210122-l7.1.dar [orion:/scratch/backups/]
79237797 home-20210121-l4.1.dar [orion:/scratch/backups/]
46002295 home-20210120-l5.1.dar [orion:/scratch/backups/]
96332012 home-20210119-l2.1.dar [orion:/scratch/backups/]
78037401 home-20210118-l3.1.dar [orion:/scratch/backups/]
91735019 home-20210117-l1.1.dar [orion:/scratch/backups/]
97052916 home-20210116-l6.1.dar [orion:/scratch/backups/]
80359764 home-20210115-l7.1.dar [orion:/scratch/backups/]
98385074 home-20210114-l4.1.dar [orion:/scratch/backups/]
62601445 home-20210113-l5.1.dar [orion:/scratch/backups/]
73749317 home-20210112-l2.1.dar [orion:/scratch/backups/]
67564592 home-20210111-l3.1.dar [orion:/scratch/backups/]
96643821 home-20210110-l1.1.dar [orion:/scratch/backups/]
113825418 home-20210109-l6.1.dar [orion:/scratch/backups/]
92711331 home-20210108-l7.1.dar [orion:/scratch/backups/]
101974794 home-20210107-l4.1.dar [orion:/scratch/backups/]
72500168 home-20210106-l5.1.dar [orion:/scratch/backups/]
84543562 home-20210105-l2.1.dar [orion:/scratch/backups/]
* 11687025 home-20210104-l0-cat.1.dar [orion:/scratch/backups/]
* 1563907072 home-20210104-l0.1.dar [orion:/scratch/backups/]
* 60423634 home-20210104-l0.2.dar [orion:/scratch/backups/]
73820512 home-20210102-l6.1.dar [orion:/scratch/backups/]
72067136 home-20210101-l7.1.dar [orion:/scratch/backups/]
#
Copying dump files with vacuum.pl
vacuum.pl [--test] [--verbose] [--usage|-?] [--help]
[--from=<source-dir>] [--to=<dest-dir>]
[--mode=(mv|cp)] [--prefix=<tag> ... ]
[--since=<date-string>] [--min-free-left=<size>]
Tidying up with clean-backups.pl
clean-backups.pl [ --conf=<config-file> ]
[ --[no]test ] [ --verbose ... ]
clean-backups.pl [ --usage | --help ]
# Backup configuration.
[scorpio:/scratch]
min-free-space = 10
min-odd-retention = 60
min-even-retention = 120
clean = home
[orion:/scratch]
min-free-space = 0.5
min-odd-retention = 60
min-even-retention = 120
clean = home
Notes on mirroring backup
Mirroring case history 1: Web server content
cd /scratch/mirror
rsync -avz common-lisp.net::project/cmucl/ cmucl > /root/cmucl-mirror-log.text
Mirroring case history 2: Full disk copy
I use rsync for my weekly backups -- I've got two 120GB disks in my
computer, and I have a 250GB disk in an external firewire enclosure.
Once the external drive is mounted at /mnt/backup,
all it takes is this simple command:
rsync -a --delete --exclude /mnt/backup / /mnt/backup
The -a switch is for archival copying, --exclude tells
it not to copy the external drive onto itself, and --delete
means to delete any files on the destination that no longer exist on the
source. The result is that, when complete, the disk at
/mnt/backup is an exact copy of my root filesystem (which
includes both 120GB disks). rsync is of course known for its
highly efficient remote-update algorithm whereby only the changes in
files are transmitted; in practice, I find that my weekly backup takes
about an hour to run on my 172GB of used space.
Acknowledgements
Thanks to Anthony DiSante <orders at nodivisions
dot com> for pointing out that I had neglected to mention
rsync; the resulting reorganization of the material has made
this page much more comprehensive.
Bob Rogers
<rogers@rgrjr.com>