Quite often, we wish to know the differences between two versions of a file or a group of files, which may be organized in a directory tree. We wish to be sure that work done is as desired and no errors have been introduced inadvertently in the file(s). We can only do that by reviewing the differences between the initial version of the files, when we started the work, and the final version, after the changes have been made. Furthermore, to preclude the possibility of errors in file comparison, the differences need to found mechanically, by a program. diff is an excellent program under Linux and other Unix-like operating systems for finding differences between two versions of a set of files organized in a directory structure.
Suppose you have a directory tree, package-x.y.z containing text files. You modify the files in the package and are ready to release a new version for the package. The modified files, comprising of the latest version, are in directory, package-x.z.0. Before release, you wish to compare the new version with previous version and you can do that with the command,
$ diff -Nur package-x.y.z package-x.z.0 > diff_xyz_xz0
Using diff for comparing files is meaningful only for text files. The
-N option tells diff to treat the absent files as empty. We will look at
-u option shortly. The
-r option tells it to recurse the directories looking for files to compare. The differences are re-directed to file diff_xyz_xz0. You could ave redirected to any other file or simply output to standard output.
diff gives output in three formats. The first is the
normal format. In the normal format, the differences are listed in the output. Normally, you wish to see the unchanged lines around the lines that have changed; there is a context format with
-c option for that. The unified format is an improvement of the context format as redundant context lines are not shown.
-u option selects the unified format. We will use the unified format for our diff outputs.
Let's take a hypothetical example of similar directories a and b having some source files and being under a common parent directory. We go to the parent directory containing a and b and run the diff as
$ diff -Nur a b > diff12
and let's assume we get the following output in diff12.
diff -Nur a/boxes.c b/boxes.c
--- a/boxes.c 2009-06-09 17:35:31.000000000 +0530 +++ b/boxes.c 2009-06-09 17:36:47.000000000 +0530 @@ -12,7 +12,7 @@
gtk_init (&argc, &argv); window = gtk_window_new (GTK_WINDOW_TOPLEVEL); - gtk_window_set_title (GTK_WINDOW (window), "Boxes"); + gtk_window_set_title (GTK_WINDOW (window), "New Boxes"); /* gtk_container_set_border_width (GTK_CONTAINER (window), 10); diff -Nur a/new/tables.c b/new/tables.c --- a/new/tables.c 2009-06-09 17:35:31.000000000 +0530 +++ b/new/tables.c 2009-06-09 17:37:19.000000000 +0530 @@ -12,7 +12,7 @@ window = gtk_window_new (GTK_WINDOW_TOPLEVEL); gtk_window_set_title (GTK_WINDOW (window), "Tables"); gtk_container_set_border_width (GTK_CONTAINER (window), 10); - //gtk_widget_set_size_request (window, 400, 150); + gtk_widget_set_size_request (window, 400, 150); /* Connect the main window to the destroy and delete signals. */ g_signal_connect (G_OBJECT (window), "destroy", G_CALLBACK (destroy), NULL);
The above diff output tells that for two files in directory a, a/boxes.c and a/new/tables.c, the corresponding files in directory b, b/boxes.c and b/new/tables.c, are different. Rest of the files are the same. Assuming b to be a newer version of software, we can say that a/boxes.c and a/new/tables.c have changed between the two versions. The changed lines are marked with
+ and the unchanged lines give the context of changes so that you can locate the changed lines fast.
If we wanted to send the newer version b of the software to someone who already has the version a, it is not necessary to send the entire directory tree b. We can just send the differences file, diff12. The receiver, having the version, a, goes to the directory a and runs the patch command,
$ cd a $ patch -p1 < diff12
The p1 option tells patch to skip one part of the path of file names found in diff12. In this case, for the file name in lines
--- a/boxes.c 2009-06-09 17:35:31.000000000 +0530 +++ b/boxes.c 2009-06-09 17:36:47.000000000 +0530 ...
--- a/new/tables.c 2009-06-09 17:35:31.000000000 +0530 +++ b/new/tables.c 2009-06-09 17:37:19.000000000 +0530 ...
a/ and b/ would be skipped and files boxes.c and new/tables.c would be patched. The
pnum option strips the smallest prefix containing
num leading slashes in each file name in the patch file.