Synchronizing Files Between Multiple Computers Using Unison

As we work on different computers at different times, it becomes increasingly difficult to keep track of file versions. For example, you might be working on a development of a new software product on a desktop computer. You might also have the development environment on a laptop and do some work on it occasionally. At the end of the day, you wish to synchronize the two versions on the laptop and the desktop. Unison is an excellent tool for that. Unison can help you in synchronizing the two directory trees of the software package on the two computers. It can help you in copying the files modified on laptop to desktop and vice-versa. In case some files have changes on both laptop as well as desktop, Unison indicates that and gives you an option about the direction you wish to copy each such file, that is, whether a file should be copied from desktop to laptop or vice-versa.

INSTALLATION

Unison can be installed by giving the command,

$ sudo apt-get install unison

CONCEPTS

REPLICA

A replica is a set of files organized in a directory tree structure which need to synchronized with similar structure at another location, which may be a node at the local file system or at a remote host.

ROOTS

A replica is identified by its root. The root gives the location of a replica, either as a node in local file system, or at a remote host. A root may be absolute or relative. In case it is absolute, its value starts with /, the root of a file system. Otherwise it is relative to the file system directory where Unison has been started. For example,

  • abc/def is a local relative root,
  • whereas /home/abc/def is a local absolute root, and
  • ssh://user@remotehost/abc/def is a remote relative root, and
  • ssh://user@remotehost//home/abc/def is an example of a remote absolute root.

PATHS

A path refers to a point in a replica. A path may refer to

  • an ordinary file, or
  • a symbolic link, or
  • a directory or
  • it may not refer to anything in the replica.

A path is specified relative to the root. For example, if the root abc/def is being synchronized with another replica and then path ghi/jkl refers to the string abc/def/ghi/jkl, which may be one of the above-mentioned four possibilities. An empty path denotes the root.

UPDATE

An update refers to change in contents of a path since Unison was used to synchronize the replica last time. If the path is an ordinary file, the contents are the actual bytes in the file and its permission bits. If the path is a symbolic link, its contents are the string giving the link in the file system. If the path is a directory, its contents are the token DIRECTORY along with the permission bits of the directory. If the path is nothing, that is, it is not a file, symbolic link or directory, its contents are the token "ABSENT".

CONFLICT

A path is conflicting, if it has been updated in one of the replicas and the contents in the two replicas are not identical.

PROCESSING

Based on the analysis of its archive files and current contents of paths, Unison identifies the paths, which are the same in both replicas and, for these, nothing needs to be done. Then, there are paths which have been updated and need to be synchronized. Updated paths can be categorized into two groups. First, the paths which are not conflicting, for example new files that have been created since the last synchronization. For these, Unison suggests a default action to the user. Then, there are updated paths which are conflicting. Unison lists the conflicting updates, one by one, giving the opportunity to the user to examine the relevant path in more detail and choose an appropriate action.

USAGE

The unison command syntax is,

$ unison [options]
$ unison root1 root2 [options]
$ unison profilename [options]
$ unison profilename root1 root2 [options]

We will get back to the above-mentioned command formats but first, let's look at the concepts of .unison directory, preferences and profiles.

.unison DIRECTORY

Unison related data is stored in .unison directory. Its location depends on the UNISON environment variable. If set, its value is the name of this directory. Otherwise the default is $HOME/.unison. The archive file for each replica and profiles are stored in the .unison directory.

PREFERENCES

It is possible to pass parameters to Unison so that fine tune its processing as per your requirements. These parameters are known as preferences. Some preferences are boolean valued flags, these are either set or not set. Other preferences have specific values. Some of the preferences are, for example,

  • auto - automatically accept default actions
  • backup xxx - add a pattern to backup list
  • ignore xxx - add a pattern to ignore list
  • log - record actions in file specified by logfile preference
  • logfile xxx - log file name
  • maxthreads n - maximum number of concurrent file transfers
  • path xxx - path to synchronize
  • root xxx - root of a replica

PROFILES

Preferences can be stored in a text file so that they need not be typed every time Unison is run. This text file is called a profile. Profiles are stored in the .unison directory. The default profile name is default.prf. If Unison is started with no arguments on the command line, Unison looks for default.prf in the .unison directory for the values of roots of two replicas and other preferences. Also, if Unison is started with just one argument, say, profilename, (other than options, which are preferences preceded by -), it looks for a profile named profilename.prf in the .unison directory.

LISTS

There are two lists that are important. These are the backup list and the ignore list. First, let us look at the backup list. When Unison is run, it overwrites files during synchronization. Using the backup preference, you can ask Unison to take a backup of a file before overwriting it. The preference is,

backup pattern

Where pattern can one of the three forms,

Regex regexp
Name name
Path path

In all cases backup is taken of any path that matches the pattern. The first form of pattern is keyword Regex followed by a Posix regular expression. In the second case, the last component of path must match name. In the third case, the whole path must match path. Both name and path are not regular expressions. They are globs which means you can use characters like * and ? and other globbing conventions that you use while using the familiar ls command for listing files in a directory from shell.

You can give multiple backup preferences to add to the backup list. There are some more preferences related to backup. The backuplocation preference determines whether the backups are kept near the files (backuplocation = local) in the same directory or kept centrally in a designated directory (backuplocation = central). If backuplocation is set to central, backupdir indicates the directory in which files are to be backed up. The backupprefix and backupsuffix account for the naming of backups. Also, maxbackups preference determines the number of backups to be kept. The default is 2.

Similarly, there is the concept of ignore list. There are often not so important files like temporary files, caches, big binaries downloaded from external sites, etc in replicas, which you do not wish to synchronize. These can be excluded by using the ignore preference,

ignore = pattern

which causes all paths matching the pattern to be ignored in the synchronization process.

USAGE (REVISIT)

As mentioned above, Unison can be run from the shell by giving one of the following commands,

$ unison [options]
$ unison root1 root2 [options]
$ unison profilename [options]
$ unison profilename root1 root2 [options]

In all the four forms, options are optional and represent preferences, prefixed by - and separated by spaces, given on the command line. In all the four formats, profiles play an important role. Profiles are kept in the .unison directory. In the first form, Unison tries to synchronize as per the default profile, default.prf file in the .unison directory, which must exist. In the second form, Unison tries to synchronize the two replicas given by their respective roots, root1 and root2, using the default.prf profile. If default.prf exists, it must not contain any root directives. If default.prf does not exist, Unison creates a blank default.prf and proceeds. In the third form, Unison tries to synchronize based on the contents of the profile, profilename.prf in the .unison directory. The fourth form is useful when you wish to synchronize two roots without using the default (existing) profile, default.prf. So in case you have an existing default.prf, and you wish to synchronize two roots without using it, create an empty file with the name, say nothing.prf and pass it as profilename in the fourth command format. Of course, we could have put some options in the profile, nothing.prf, and named it something.prf instead.

As an example, let's try to synchronize the tmp directory in the home directory of the local system with the /home1/user1/tmp of a remote system. Since I have a default.prf profile in my .unison directory, which I do not want to use, I create an empty file named nothing.prf as a profile for this example.

$ >~/.unison/nothing.prf $ unison nothing tmp ssh://user1@192.168.1.101//home1/user1/tmp Contacting server... user1@192.168.1.101's password: Connected [//host1//home/user1/tmp -> //host2//home1/user1/tmp] Looking for changes Waiting for changes from server Reconciling changes local host2 new file abc [f] ? Commands: <ret> or f or <spc> follow unison's recommendation (if any) I ignore this path permanently E permanently ignore files with this extension N permanently ignore paths ending with this name m merge the versions d show differences x show details L list all suggested changes tersely l list all suggested changes with details p or b go back to previous item g proceed immediately to propagating changes q exit unison without propagating any changes / skip > or . propagate from from local to host2 < or , propagate from from host2 to local new file <==== abc [f] < new file <==== bar [f] < new file <==== te [f] < <---- new file y [f] < Proceed with propagating updates? [] y Propagating updates UNISON 2.40.65 started propagating changes at 18:06:10.21 on 14 Oct 2012 [BGN] Updating file x from //host2//home1/user1/tmp to /home/user1/tmp [BGN] Copying y from //host2//home1/user1/tmp to /home/user1/tmp Shortcut: copied /home/user1/tmp/y from local file /home/user1/tmp/.unison.x.fe35b335fb2bfd033e2f977d51ab0c89.unison.tmp [END] Copying y [END] Updating file x [BGN] Deleting abc from /home/user1/tmp [BGN] Deleting bar from /home/user1/tmp [BGN] Deleting te from /home/user1/tmp [END] Deleting abc [END] Deleting bar [END] Deleting te UNISON 2.40.65 finished propagating changes at 18:06:10.60 on 14 Oct 2012 Saving synchronizer state Synchronization complete at 18:06:10 (5 items transferred, 0 skipped, 0 failed)

I the above example, I have preferred the files in the remote machine over the local machine. As such, files are copied from host2 to host1 and those which are not in host2 but are present in host1 are deleted from host1. It is important to note that although Unison has a recommendation, it is the user who controls the synchronization. As is seen from the help message printed, the user can give input to copy files in either direction, ignore a path, see differences, etc.

Obviously, you would like to have a profile so as to minimize typing for periodic synchronization. let's say you wish to synchronize your home directory between your laptop and desktop computers. Assuming the laptop is the client from where unison command is given, you may wish to have a profile name default.prf in your .unison directory something like this,

# Roots for synchronization
root = /home/me
root = ssh://me@192.168.1.2//home/me

# paths to synchronize
path = src
path = doc
path = www

# Ignore some names and paths
ignore = Name tmp*
ignore = Name temp*
ignore = Name .*
ignore = Path */new/copy/src/*
ignore = Name *.o

# keep backups
backuplocation = central
backupdir = /home/me/backups
backup = Name *
backupprefix = $VERSION.

# keep log
log = true

# skip asking for confirmations for non-conflicting changes
auto = true

In the above profile, some paths are ignored. But quite often, some very important files lie in these paths, without which a software package might just not work. Also, disk space is not that scarce these days. So it makes sense not to ignore anybody (any file) and synchronize full directory trees. So we delete the ignores and get the profile,

# Roots for synchronization
root = /home/me
root = ssh://me@192.168.1.2//home/me

# paths to synchronize
path = src
path = doc
path = www

# keep backups
backuplocation = central
backupdir = /home/me/backups
backup = Name *
backupprefix = $VERSION.

# keep log
log = true

# skip asking for confirmations for non-conflicting changes
auto = true

With a profile like this, you can synchronize by simply giving the command,

$ unison

Unison versus rsync

rsync is a popular file synchronization command. So, between Unison and rsync, which command should be used? The answer lies in the fact that rsync is a one way synchronization command whereas Unison synchronizes files both ways. So, when we are sure of the source files, and, we want destination to become an exact replica of the source, rsync should be used. A typical example of this is the case of taking backups, where rsync is widely used. But, in the case where files have changed at both ends, the local and remote hosts, and we wish to have the latest files at both places, Unison should be used.

SYNCHRONIZING MORE THAN TWO COMPUTERS WITH UNISON

Unison works between a pair of computers. How to synchronize three computers, or say, any number of computers? Conceptually, whenever files change at a system, all other systems should be synchronized to it. From the usage point of view, it is easier if we designate one computer as the main server. Then, we always synchronize any computer in the setup with the main server only. That is, the main server has the updates of all computers. And, we synchronize all other computers with the main server. That way, the changes in files in any system percolate to all other systems. And, finally, to take care of the event of breakdown of the main server, the main server can thought of as a group of servers that synchronize with each other more frequently.

REFERENCE

Unison File Synchronizer, User Manual and Reference Guide

SEE ALSO

rsync command

Software: