Sort command in Linux

1.0 Sort

The sort command is for sorting lines in text files. For example, if we have a file names, we can sort it with the sort command,

$ cat names
John Doe
Jane Doe
John Roe
Richard Roe
Tommy Atkins
Max Mustermann
Erika Mustermann
Joe Bloggs
$
$ sort names
Erika Mustermann
Jane Doe
Joe Bloggs
John Doe
John Roe
Max Mustermann
Richard Roe
Tommy Atkins

The words in the input lines are fields, which are numbered 1 onwards. So if we want to sort based on last names, we can sort the above file on the second field.

$ sort -k2,2 names
Tommy Atkins
Joe Bloggs
Jane Doe
John Doe
Erika Mustermann
Max Mustermann
John Roe
Richard Roe

2.0 Sort order

Let's look at another example, comprising of famous quotations by Lewis Carroll, and the sorted output.

$ cat rquote
1. 
"The time has come,"
the Walrus said,
"To talk of many things:
Of shoes--and ships--
and sealing wax--
Of cabbages--and kings."
 Lewis Carroll

2.
"The White Rabbit put on his spectacles. 
'Where shall I begin, please your Majesty?' he asked.
'Begin at the beginning,' the King said gravely, 
'and go on till you come to the end: then stop.'"

 by Lewis Carroll: Alice in Wonderland
$
$ sort rquote


1. 
2.
'and go on till you come to the end: then stop.'"
and sealing wax--
'Begin at the beginning,' the King said gravely, 
 by Lewis Carroll: Alice in Wonderland
 Lewis Carroll
Of cabbages--and kings."
Of shoes--and ships--
"The time has come,"
the Walrus said,
"The White Rabbit put on his spectacles. 
"To talk of many things:
'Where shall I begin, please your Majesty?' he asked.

The output is not quite as expected because we have lowercase, followed by uppercase and then again lowercase as the first character of lines in the output. We can fix the error by setting the environment variable, LC_ALL=C. Actually, the comparisons are done on the basis of collating sequence specified by LC_COLLATE. But, LC_ALL overrides LC_COLLATE and it's better to set LC_ALL to C.

$ LC_ALL=C
$ export LC_ALL
$ sort rquote


 Lewis Carroll
 by Lewis Carroll: Alice in Wonderland
"The White Rabbit put on his spectacles. 
"The time has come,"
"To talk of many things:
'Begin at the beginning,' the King said gravely, 
'Where shall I begin, please your Majesty?' he asked.
'and go on till you come to the end: then stop.'"
1. 
2.
Of cabbages--and kings."
Of shoes--and ships--
and sealing wax--
the Walrus said,

Looking at the first character of lines, the spaces come first (There is a space at the beginning of lines containing the author's name). Next we have the double quotes, followed by single quote and then digits. After that, we have the uppercase characters followed by lowercase. The sequence matches the ASCII character set sequence.

3.0 Sort in reverse order

The -r option reverses the sort order so that bigger key values appear earlier in output. For example, sort -r -k2,2 names sorts names in the reverse order of last names.

$ sort -r -k2,2 names
Richard Roe
John Roe
Max Mustermann
Erika Mustermann
John Doe
Jane Doe
Joe Bloggs
Tommy Atkins

4.0 Sort numerically

The -n option sorts based on the numeric value of strings. For example,

$ cat attendance 
John Doe          12
Jane Doe           5
John Roe          25
Richard Roe        3
Tommy Atkins      14
Max Mustermann     2
Erika Mustermann  24
Joe Bloggs         7
$
$ sort -n -k3,3 attendance 
Max Mustermann     2
Richard Roe        3
Jane Doe           5
Joe Bloggs         7
John Doe          12
Tommy Atkins      14
Erika Mustermann  24
John Roe          25

5.0 Sort in reverse numeric order

We can combine the -r option with the -n to sort in the reverse numeric order.

$ sort -nr -k3,3 attendance 
John Roe          25
Erika Mustermann  24
Tommy Atkins      14
John Doe          12
Joe Bloggs         7
Jane Doe           5
Richard Roe        3
Max Mustermann     2

6.0 Sort folding lowercase to upper case

The -f option does a case insensitive sort, folding lowercase letters to uppercase and treating the two as the same. For example, if some names had been typed in lowercase, we can still get the correct sort order using the -f option.

$ cat names
John Doe
jane doe
John Roe
richard roe
Tommy Atkins
Max Mustermann
erika mustermann
Joe Bloggs
$
$ sort -f -k2,2 names
Tommy Atkins
Joe Bloggs
John Doe
jane doe
Max Mustermann
erika mustermann
John Roe
richard roe

7.0 Sort based on key

You can sort the lines based on one or more keys using the option, -k POS1[,POS2] for each key. POS1 and POS2 are the start and end positions of a key. If POS2 is omitted, the key is from POS1 to the end of the line. Each POS is defined as F[.C][OPTS]. F is the field number. Field numbers start with 1. C is the start or end character position of the key inside the field. The character position starts with 1, which is the default value for the start position. The default value of C for the end position is the end of the field. Now the sort command has a lot of options. You can, optionally, apply these options to the key using one or more characters from the set, { b d f g h i M n R r V }. Suppose we wish to sort names first on the last name and then on the first name,

$ sort -k2,2 -k1,1 names
Tommy Atkins
Joe Bloggs
Jane Doe
John Doe
Erika Mustermann
Max Mustermann
John Roe
Richard Roe

As another example, consider sorting on last name, first name and the numeric key, marks, in the reverse order.

$ cat class
AC12 John Doe          112     Science
AC12 John Doe          132     Mathematics
RA11 Jane Doe           25     Art
RA11 Jane Doe          171     Craft
AP12 John Roe          123     Literature
AL14 Richard Roe        43     Language
AL14 Richard Roe       123     Literature
YM17 Tommy Atkins      126     Mathematics
AS12 Max Mustermann    121     Geography
PT14 Erika Mustermann  181     History
DE12 Joe Bloggs        171     Social Studies
$
$ sort -k3,3 -k2,2 -k4,4rn class
YM17 Tommy Atkins      126     Mathematics
DE12 Joe Bloggs        171     Social Studies
RA11 Jane Doe          171     Craft
RA11 Jane Doe           25     Art
AC12 John Doe          132     Mathematics
AC12 John Doe          112     Science
PT14 Erika Mustermann  181     History
AS12 Max Mustermann    121     Geography
AP12 John Roe          123     Literature
AL14 Richard Roe       123     Literature
AL14 Richard Roe        43     Language

As another example, consider the case when there are a bunch of log files named log1.gz, log2,gz, ., log100.gz, .., log200.gz. And we want the sorted directory listing.

$ ls
log101.gz  log102.gz  log103.gz  log104.gz  log105.gz  log106.gz  log10.gz  log1.gz  log200.gz  log20.gz
$ ls | sort -t . -n -k1.4
log1.gz
log10.gz
log20.gz
log101.gz
log102.gz
log103.gz
log104.gz
log105.gz
log106.gz
log200.gz

We define the field separator as .. The key is in the first field, fourth character onwards. This gives the proper sort order.

8.0 Sort using a different field separator

The default field separator is the transition from a non-blank character to the blank character. We can change this with the -t option. Suppose we want the sorted list of users, we can get it from the /etc/passwd file.

$ sort -t : -k1,1 /etc/passwd | awk -F: '{ print $1 }'
avahi
avahi-autoipd
backup
bin
colord
daemon
...

If the field separator is tab, special syntax is required for specifying the delimiter, as explained in ANSI-C Quoting. Suppose the first and last names are separated by a tab in names, we can sort on the last name as shown below.

$ cat names
John	Doe
Jane	Doe
John	Roe
Richard	Roe
Tommy	Atkins
Max	Mustermann
Erika	Mustermann
Joe	Bloggs
$ sort -t $'\t' -k2,2 names
Tommy	Atkins
Joe	Bloggs
Jane	Doe
John	Doe
Erika	Mustermann
Max	Mustermann
John	Roe
Richard	Roe

9.0 Check if file is sorted

You can quickly checkup whether a file is already sorted using the -c or the (uppercase) -C option. The -c option prints the first out of order line. The -C option checks silently, doesn't print any diagnostic but quietly sets the return value as 1. For example,

$ sort -c names
sort: names:2: disorder: Jane Doe
$ echo $?
1
$ sort -C names
$ echo $?
1

10.0 Output unique keys

With the -u option, you get lines with unique keys only in the output. If multiple lines have the same keys, only the first one occurring in the input is written to the output; the rest are discarded. For example,

$ cat xnames
Jame Doe
John Doe
Jane Doe
John Roe
Richard Roe
Tommy Atkins
Max Mustermann
Erika Mustermann
Joe Bloggs
$
$ sort -u -k2,2 xnames
Tommy Atkins
Joe Bloggs
Jame Doe
Max Mustermann
John Roe

11.0 Merge Files

With the -m option, we can merge already sorted files. For example, if names and names1 are already sorted, we can merge them as shown below:

$ cat names
Erika Mustermann
Jane Doe
Joe Bloggs
John Doe
John Roe
Max Mustermann
Richard Roe
Tommy Atkins
$
$ cat names1
Erica Mustermann
Jack Doe
Janney Doe
John Bloggs
Johnny Roe
Ray Mustermann
Richie Roe
Thomas Atkins
$
$ sort -m names names1
Erica Mustermann
Erika Mustermann
Jack Doe
Jane Doe
Janney Doe
Joe Bloggs
John Bloggs
John Doe
Johnny Roe
John Roe
Max Mustermann
Ray Mustermann
Richard Roe
Richie Roe
Thomas Atkins
Tommy Atkins

12.0 Sort Options

The options for the sort command are summarized in a table below:

sort options
OptionDescription
-bIgnore leading blanks.
-dDictionary order. Only blanks and alphanumeric characters are considered.
-fIgnore case. Fold lowercase to uppercase characters.
-gGeneral numeric sort. Converts numbers to floating point for comparison. Not recommended as it is slower than the -n option.
-hHuman numeric sort. Sort first by sign, then SI suffix (blank, k, K or one of 'MGTPEZY') and finally by the numeric value.
-iIgnore non-printing characters. Sort considering only printable characters.
-MMonth sort, where JAN < FEB < ... < DEC.
-nNumeric sort. Sort considering the numeric value of strings.
-RRandom sort. Use a random hash function for input keys and then sort the hash values.
-rReverse the result of comparison. The greater key values come before the smaller ones.
-VVersion sort. Each number with decimal point is treated like a version name and number.