Backups with rsync
Now for another incredibly interesting post, this time about the rsync.
The rsync man page includes a couple examples for using rsync to back up directory trees, but what do all those options do, really? And just as important, which options do I like best?
Example
rsync -Cavz . arvidsjaur:backup
- -C, –cvs-exclude auto-ignore files in the same way CVS does [includes .svn/ subtrees]
- -a, –archive archive mode; same as -rlptgoD (no -H)
- -r, –recursive recurse into directories
- -l, –links copy symlinks as symlinks
- -p, –perms preserve permissions
- -t, –times preserve times
- -g, –group preserve group
- -o, –owner preserve owner (root only)
- -D, –devices preserve devices (root only)
- -H, –hard-links preserve hard links [“This option can be quite slow, so only use it if you need it.”]
- -v, –verbose increase verbosity
- -z, –compress compress file data during the transfer
Other options of note:
- –delete deletes files on the receiving side which aren’t on the sending side, but only if
-r
or-d
is also included, and only for those directories which are being sync’d; always use--dry-run
first to see what would be affected - –exclude any path which matches this simple filter isn’t sync’d
- –progress show progress during transfer [don’t use inside
cron
]
Compare rsync -avz foo:src/bar /data/tmp
and rsync -avz foo:src/bar/ /data/tmp
: the first will create /data/tmp/bar, the second will copy the contents of bar into /data/tmp. Without the tailing slash an additional directory level is created at the destination, with the tailing slash just the contents of the directory are copied. Thus, these are identical:
rsync -av /src/foo /dest
rsync -av /src/foo/ /dest/foo
So for me the command would look something like:
time rsync -Cavz --exclude 'research/gutenberg' --exclude 'research/output' --delete --dry-run /home/steve/data /media/usbdisk/steve/backup
rsync -Cavz --progress --delete --dry-run ~/music/ ideaharbor.org:~/music/
(Make sure you check du -hT
on your destination before you begin.)
There are two different ways rsync
can connect to a remote system. Which method is used is controlled by the number of colons (:’s) included in the source or destination paths. When using a single colon, as in the examples above, rsync will use ssh to connect to the remote machine as the specified user. Hence any commands (such as creating new directories or writing to files) will be run as the specified user. When two colons are used rsync will connect to the rsync daemon (rsyncd) running on the remote machine, and any commands executed will be limited by the permissions of the user rsyncd is running as. rsyncd is configured using the rsyncd.conf file (see `man rsyncd.conf` for details).