How to take backups in Linux using the Command Line

  • Post author:
  • Post last modified:June 8, 2024
  • Reading time:6 mins read

Backups

Taking backup involves making a copy of relevant files and storing it somewhere else so that data is available even when something wrong happens to the original files. Backups help in preserving data in the face of rare but possible hardware and/or system software outages.

1.0 Taking Backups

The files are organized in a tree structure on the secondary storage.

File system in Linux showing directories

In general, we need to take backup of a node in the file system. A node is a directory with zero or more directories and files under it. If the backup of multiple nodes is required, the process can be repeated for other nodes.

Suppose there are two nodes src and dest in the file system and we want to backup src at dest. There are two possibilities regarding the backup. First, we want to replicate the entire directory tree of src at dest. Second, we wish to create a “flattened” archive of src at dest. We will look at the commands for both the cases.

2.0 Copying the directory structure

We can copy the entire directory structure of src under dest with the rsync command,

$ rsync -avz src dest

The entire directory structure of src onward is copied at dest. dest must be a directory. However, if we wish to copy the contents of src in dest, we can give the command,

$ rsync -avz src/ dest

If you wish to make dest an exact copy of src, the command is,

$ rsync -avz --delete src/ dest

The –delete option deletes any file that is there in dest, but not in src. This is useful when dest is being synchronized with src repeatedly and there are files in dest of an earlier rsync operation which are not there currently in src.

If dest is on a remote host, you can include the user-id and host id in the dest specification. For example, if the dest is at host with IP 172.18.200.150, we can give the command,

$ rsync -avz --delete src/ user1@172.18.200.150:dest

Since dest does not start with a “/”, it is assumed to be in user1‘s home directory. Also, if some of the files are big, we can monitor the progress of transfer by including the –progress option.

$ rsync -avz --delete --progress src/ user1@172.18.200.150:dest

rsync is a smart file copying program. It copies files only when it is necessary to do so. For example, if there is a file at the destination which is the same as at the source, that file is not copied. That way, the amount of data transferred and the time taken for copying files, are both minimized.

3.0 Creating an archive file

In most cases, replicating source directory tree at the destination might not be the desired result. We want the backup to be a single compact archive, from which a directory tree equivalent to the source can be created when required. We can use the tar command for this. tar stands for “tape archival” as backup used to taken on magnetic tape in the olden days.

Suppose, we wish to take the backup of src directory, which has the absolute path /home/user-1/www/src. We can go to the parent www directory and give the tar command, as below.

$ cd /home/user1/www
$ tar -cvzf /home/user1/backup/src.tar.gz src

The directory backup must exist in /home/user1 before the tar command is given. The -c option is for copy (to the archive), -v is for verbose output, -z is for the gzip compression of the archive and -f is for the file name of the archive. The filename of the archive is given just after the -f option. And the last argument is the name of the directory to be archived.

We can restore the src directory tree from the src.tar.gz archive using the tar command.

$ tar -xvzf /home/user1/backup/src.tar.gz

The -x option is for “extract”.

4.0 SHA256 checksum for the archive file

We need to have a method to ensure the integrity of our backed-up archive. We can keep a record of SHA256 checksum of our backup archive. The checksum is found using the sha256sum command.

$ cd /home/user1/backup
$ sha256sum src.tar.gz > ../src.sha256
$ mv ../src.sha256 .

When in doubt, check the checksum for the archive using the sha256sum -c <checksum-file> command.

$ cd /home/user1/backup
$ sha256sum -c src.sha256
src.tar.gz: OK

Karunesh Johri

Software developer, working with C and Linux.