rsync is a one-way file synchronization tool used in Linux and other Unix-like systems. It is used to ensure that the destination file(s) become an exact replica of corresponding file(s) at the source, thus providing an excellent tool for taking backups. rsync is inherently a file copying tool. However, it is a better file copying tool as it uses the delta encoding technique for copying files; for files existing at the destination, only the differences from the source are transferred.
One can rsync files locally on the same computer and also between the local computer and a networked remote host. It is not possible to use rsync between two remote computers. If you rsync between the local computer (from where the rsync command is given) and a remote host, a server process is required to be running on the remote host to facilitate the rsync file transfer. The server process on the remote host could be the secure shell daemon, sshd, or the rsync server daemon, rsyncd.
rsync COMMAND SYNTAX
rsync [OPTION...] SRC... [DEST]
Using with remote shell
rsync [OPTION...] [USER@]HOST:SRC... [DEST]
rsync [OPTION...] SRC... [USER@]HOST:DEST
Using with the rsync daemon
rsync [OPTION...] [USER@]HOST::SRC... [DEST] rsync [OPTION...] rsync://[USER@]HOST[:PORT]/SRC... [DEST]
rsync [OPTION...] SRC... [USER@]HOST::DEST rsync [OPTION...] SRC... rsync://[USER@]HOST[:PORT]/DEST
rsync examples (local copy)
To create an environment of trying rsync locally, let us create two sub-directories with two files each by giving the commands,
$ mkdir a $ mkdir b $ cd a $ echo Hello World >x $ echo Hello Moon >y $ cd ../b $ echo Hello Mars >p $ echo Hello Jupiter >q $ cd ..
In the end, we get the following files in the working directory,
a/x a/y b/p b/q
and we give the command,
$ rsync a b
rsync responds with
skipping directory a
and nothing happens. This is because directories are not recursed by default and rsync has just skipped the sub-directory a and done nothing. A little wiser, we give the command,
$ rsync -r a b
And the result is,
a/x a/y b/a b/a/x b/a/y b/p b/q
rsync has copied directory a (source) into directory b (destination). Suppose we want the files of a copied into b, we could have given the command,
$ rsync -r a/ b
with the result,
a/x a/y b/x b/y b/p b/q
The effect of the trailing / after source (a/) is that the contents of directory a have been copied into the directory b. And, if you give the command rsync -r a, the contents of the directory a, are shown, something similar to the output of the famous ls -R command.
Having got a feel of rsync, it is time to look at some of the important options. These options, of course, apply to the local as well as remote usage of rsync.
rsync – IMPORTANT OPTIONS
|-r, –recursive||recurse into directories|
|-l, –links||copy symlinks as symlinks|
|-p, –perms||preserve permissions|
|-t, –times||preserve modification times|
|-g, –group||preserve group (super-user only)|
|-o, –owner||preserve owner (super-user only)|
|–devices||preserve device files (super-user only)|
|–specials||preserve special files|
|-D||same as –devices –specials|
|-a, –archive||same as -rlptgoD|
|-u, –update||skip files that are newer in the destination|
|–delete||delete files in destination which are not there in source|
|–exclude=pattern||exclude files with names matching the pattern|
|–exclude-from=file||exclude files with names matching patterns given in the file|
|–include=pattern||include files with names matching the pattern|
|–include-from=file||include files with names matching patterns given in the file|
|–files-from=file||read source file names from file|
|-v, –verbose||print verbose output during execution|
|-h, –human-readable||display numbers in easy to read format|
|–progress||display progress of file transfer|
|-z, –compress||compress file data during the transfer|
|-e, –rsh=COMMAND||specify the remote shell to use|
These are some of the options. There are many more allowing one to fine tune rsync as per requirements. The –delete option should be used with caution as when you use this option, you will be deleting files at destination which you do not have at source. However, if you want the destination to become an exact replica of the source directory tree, –delete option has to be used. Taking advantage of some of these options, we can rewrite our rsync command to copy the contents of directory a into directory b as,
rsync -avz a/ b
And, if we wish to exclude the file foo in the directory a from being copied to directory b, the command would be,
rsync -avz --exclude=foo a/ b
The –progress option is useful while transferring relatively large files. For example,
$ rsync -avz --progress * email@example.com:tmp
This gives the user some idea the amount of data transferred for a file at any time and how much more time is expected be taken to complete the transfer. For example, for a file being transferred, a line like this is printed and updated during the transfer.
4194304 45% 105.89 kB/s 0:00:47
This means for the current file being transferred, 4194304 bytes have been transferred, which is 45% of the total file size. The file was being transferred at the rate of 105.89 kilo bytes per second and that it was expected that the transfer would complete in next 47 seconds. When the transfer is complete, a line like this is printed.
9209698 100% 130.75kB/s 0:01:08 (xfer#1, to-check=1/2)
This means that the transferred file comprised of 9209698 bytes. The transfer was done at the rate of 130.75 kilo bytes per second and it took 1 minute and 8 seconds to transfer the file. The figures in parenthesis indicate that it was the first file transferred in that rsync session and that there was one more file to check for synchronization out of a total of two files to be checked for synchronization in that rsync session.
USING rsync WITH A REMOTE HOST
rsync needs to be installed on both local and remote hosts. The local host, where the user gives the rsync command, is referred to as the client while the remote host is the server. As the server, rsync might have been spawned by the remote shell process (rsync –server –sender …, –server and –sender are internal options of rsync and are not to be used by users) or it may be the rsync daemon, rsyncd, started by a startup script.
For running rsync with remote shell, the default shell is ssh. For other shells like rsh, you need to set the environment variable RSYNC_RSH or use the option -e in the rsync command. If you use the remote shell as transport, the syntax for specifying host involves putting a single colon (:) after the hostname. For communicating with rsync daemon a double colon (::) after the hostname is required. Or, you may specify the URL using the syntax, rsync://[USER@]HOST[:PORT]. Apart from specifying the user/host ids, the rest of the syntax for using rsync with a remote is the same as in the case of using rsync locally on a single computer.
USING RSYNC WITH REMOTE SHELL
Suppose there is a directory abc on the local host which we wish to synchronize at a remote host. The remote host needs to be supporting the remote shell access for this to take place. Let’s say that the destination path is /home/username/abc. The command for synchronizing abc on the remote host from the local host is,
rsync -avz abc/ username@remote-host:abc
You can use this command to transfer a directory tree from a local PC to a website server. It will transfer the modified files only. And, if you wish to synchronize the local directory abc from the remote host’s abc, the command would be,
rsync -avz username@remote-host:abc/ abc
-e option is useful if you wish to authenticate using a public and private key pair. If the file mykey.pem contains the private key, the command to upload files in directory abc would be,
rsync -avz -e "ssh -i mykey.pem" abc/ username@remote-host:abc
This assumes that the corresponding public key has already been stored (appended) in the remote system under ~/.ssh/authorized_keys.
SYNCHRONIZING WITH A REMOTE RSYNC DAEMON
For this, rsync daemon needs to be running on the remote host. For running the rsync daemon, a configuration file, /etc/rsyncd.conf is required at the remote host. rsyncd.conf contains global settings like, message of the day, motd file, daemon IP address, port, etc. Then, there entries for modules. A module is a symbolic name for a directory tree on the server. When a client uses the syntax for communication with the rsync daemon (using
:: after hostname or using the URL, rsync://), the first word after the hostname in path is a module name. A sample /etc/rsyncd.conf file is,
# rsyncd.conf # global parameters motd file = /etc/rsyncd.motd read only = no #modules [my_website] comment = My website pages path = /home/www/my_website [my_blog] comment = My blog pages path = /home/www/my_blog
In the /etc/rsyncd.conf file, the comments start with the
# character. First, the global parameters are given. Then, there are parameters for each module. Here, we have entries for two modules, my_website and my_blog. The entry for a module starts with the module name in square brackets, followed by the parameters for that module.
With the remote host running the rsync daemon and having the above-mentioned /etc/rsyncd.conf configuration file, we can get the entire my_website directory tree to the local host using the command,
rsync -avz rsync://hostname/my_website .
The user should have the read access for the files being pulled from the remote host. Also, note that even though we did not end the source (rsync://hostname/my_website) with a “/”, we got the individual sub-directories and files in the current directory on the local host and not the directory
my_website because the module reference does not need the trailing slash to copy the directory contents. Similarly, for putting the local directory tree htdocs into the remote my_website, and using the alternate rsync syntax for communicating with the rsync daemon, we have the command,
rsync -avz htdocs/ rsync://hostname/my_website
Of course, we need the write access in the corresponding nodes of the directory tree referenced by the module, my_website on the remote host.
rsync versus Unison
How does rsync compare with Unison, an another popular file synchronization tool? rsync, essentially, is a file copying tool, with a very clear source and destination parameters. So, the file transfer occurs in one direction only during an invocation at the client. Of course, it is a very high performance versatile file copying tool. Unison, on the other hand, is a file synchronization tool, where files might have undergone changes at both local and remote hosts. So, in the case of Unison, the file transfer occurs in both directions, at the end of which the newest files end up at both the local and remote hosts. rsync is useful for backups and disk mirroring while Unison might be relevant in development environments where work is done at multiple hosts and we wish to synchronize the hosts so that the latest software version becomes available at all hosts.