Uniq command in Linux

1.0 uniq

The uniq command is a filter for finding unique lines in input. It reads input, suppresses duplicates and prints unique lines in its output. Used with the -D option, we can do the inverse and print the duplicate lines only. The input must be sorted for uniq to work correctly. For example,

$ cat names
Jame Doe
Jane Doe
John Doe
Erika Mustermann
John Doe
John Doe
Max Mustermann
Richard Roe
Joe Bloggs
Tommy Atkins
John Roe
Jane Doe
John Doe
$
$ sort names | uniq
Erika Mustermann
Jame Doe
Jane Doe
Joe Bloggs
John Doe
John Roe
Max Mustermann
Richard Roe

2.0 Print duplicates

We can find the duplicates with the -D and -d options. The -D option prints all duplicates whereas the -d option prints a line for all instances of a duplicated line.

$ sort names | uniq -D
Jane Doe
Jane Doe
John Doe
John Doe
John Doe
John Doe
$
$ sort names | uniq -d
Jane Doe
John Doe

3.0 Print unique lines only

The -u option suppresses duplicates completely and prints only the lines which are unique in the file.

$ sort names | uniq -u
Erika Mustermann
Jame Doe
Joe Bloggs
John Roe
Max Mustermann
Richard Roe
Tommy Atkins

4.0 Print count of occurrences

The -c option prints a count for each line giving its number of occurrences in the file.

$ sort names | uniq -c
      1 Erika Mustermann
      1 Jame Doe
      2 Jane Doe
      1 Joe Bloggs
      4 John Doe
      1 John Roe
      1 Max Mustermann
      1 Richard Roe
      1 Tommy Atkins

5.0 Ignore case

With the -i option, we can run uniq so that it does a case insensitive comparison in finding unique lines.

$ sort -f names | uniq -ic
      1 Erika Mustermann
      1 Jame Doe
      2 Jane doe
      1 Joe Bloggs
      4 John Doe
      1 John Roe
      1 Max Mustermann
      1 Richard Roe
      1 Tommy Atkins
$
$ sort -f names | uniq -icd
      2 Jane doe
      4 John Doe

6.0 Skip fields, characters

We can ask uniq to ignore a number of fields with the -f option. The following command gives unique last names.

$ sort -k2,2 names | uniq -f1
Tommy Atkins
Joe Bloggs
Jame Doe
Erika Mustermann
John Roe

Similarly, we can skip a number of characters at the beginning of each line with the -s option.

$ cat class
AC12 John Doe
AC13 John Doe
RA11 Jane Doe
RA12 Jane Doe
AP12 John Roe
AL14 Richard Roe
AL15 Richard Roe
YM17 Tommy Atkins
AS12 Max Mustermann
PT14 Erika Mustermann
DE12 Joe Bloggs
$
$ sort -k2 class | uniq -s5
PT14 Erika Mustermann
RA11 Jane Doe
DE12 Joe Bloggs
AC12 John Doe
AP12 John Roe
AS12 Max Mustermann
AL14 Richard Roe
YM17 Tommy Atkins

We can limit the number of characters to be scanned with the -w option. In the following output, the first four characters are unique.

$ sort names | uniq -w4
Erika Mustermann
Jame Doe
Jane Doe
Joe Bloggs
John Doe
Max Mustermann
Richard Roe
Tommy Atkins