Uniq command in Linux

  • by

1.0 uniq

The uniq command is a filter for finding unique lines in input. It reads input, suppresses duplicates and prints unique lines in its output. Used with the -D option, we can do the inverse and print the duplicate lines only. The input must be sorted for uniq to work correctly. For example,

$ cat names
Jame Doe
Jane Doe
John Doe
Erika Mustermann
John Doe
John Doe
Max Mustermann
Richard Roe
Joe Bloggs
Tommy Atkins
John Roe
Jane Doe
John Doe
$
$ sort names | uniq
Erika Mustermann
Jame Doe
Jane Doe
Joe Bloggs
John Doe
John Roe
Max Mustermann
Richard Roe

2.0 Print duplicates

We can find the duplicates with the -D and -d options. The -D option prints all duplicates whereas the -d option prints a line for all instances of a duplicated line.

$ sort names | uniq -D
Jane Doe
Jane Doe
John Doe
John Doe
John Doe
John Doe
$
$ sort names | uniq -d
Jane Doe
John Doe

3.0 Print unique lines only

The -u option suppresses duplicates completely and prints only the lines which are unique in the file.

$ sort names | uniq -u
Erika Mustermann
Jame Doe
Joe Bloggs
John Roe
Max Mustermann
Richard Roe
Tommy Atkins

4.0 Print count of occurrences

The -c option prints a count for each line giving its number of occurrences in the file.

$ sort names | uniq -c
      1 Erika Mustermann
      1 Jame Doe
      2 Jane Doe
      1 Joe Bloggs
      4 John Doe
      1 John Roe
      1 Max Mustermann
      1 Richard Roe
      1 Tommy Atkins

5.0 Ignore case

With the -i option, we can run uniq so that it does a case insensitive comparison in finding unique lines.

$ sort -f names | uniq -ic
      1 Erika Mustermann
      1 Jame Doe
      2 Jane doe
      1 Joe Bloggs
      4 John Doe
      1 John Roe
      1 Max Mustermann
      1 Richard Roe
      1 Tommy Atkins
$
$ sort -f names | uniq -icd
      2 Jane doe
      4 John Doe

6.0 Skip fields, characters

We can ask uniq to ignore a number of fields with the -f option. The following command gives unique last names.

$ sort -k2,2 names | uniq -f1
Tommy Atkins
Joe Bloggs
Jame Doe
Erika Mustermann
John Roe

Similarly, we can skip a number of characters at the beginning of each line with the -s option.

$ cat class
AC12 John Doe
AC13 John Doe
RA11 Jane Doe
RA12 Jane Doe
AP12 John Roe
AL14 Richard Roe
AL15 Richard Roe
YM17 Tommy Atkins
AS12 Max Mustermann
PT14 Erika Mustermann
DE12 Joe Bloggs
$
$ sort -k2 class | uniq -s5
PT14 Erika Mustermann
RA11 Jane Doe
DE12 Joe Bloggs
AC12 John Doe
AP12 John Roe
AS12 Max Mustermann
AL14 Richard Roe
YM17 Tommy Atkins

We can limit the number of characters to be scanned with the -w option. In the following output, the first four characters are unique.

$ sort names | uniq -w4
Erika Mustermann
Jame Doe
Jane Doe
Joe Bloggs
John Doe
Max Mustermann
Richard Roe
Tommy Atkins
0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments

You may like these, also

  • POSIX message queues in LinuxPOSIX message queues in Linux 1.0 POSIX Message queues POSIX interprocess comunication (IPC) was introduced in the POSIX.1b standard (IEEE Std 1003.1b-1993) for real time…
  • POSIX Shared Memory in LinuxPOSIX Shared Memory in Linux 1.0 Shared Memory Shared memory is the fastest method of interprocess communication (IPC) under Linux and other Unix-like systems. The…
  • POSIX Semaphores in LinuxPOSIX Semaphores in Linux 1.0 Semaphores Semaphores are used for process and thread synchronization. Semaphores are clubbed with message queues and shared memory under…
  • fork and exec system calls in Linuxfork and exec system calls in Linux 1.0 fork and exec system calls Suppose we wish to write a "shell program" which would execute another program. Now,…
  • Connecting two computers with Ethernet LAN cableConnecting two computers with Ethernet LAN cable Quite often, we wish to connect two computers back to back using an Ethernet LAN cable. It may be because…
  • D-Bus TutorialD-Bus Tutorial 1.0 D-Bus D-Bus is a mechanism for interprocess communication under Linux and other Unix-like systems. D-Bus has a layered architecture.…
  • Socket programming using the select system callSocket programming using the select system call 1.0 Client-Server Paradigm The Client-Server paradigm divides the software architecture of a system in two parts, the server and its…
  • System V message queues in LinuxSystem V message queues in Linux 1.0 Message queues Message queues are one of the interprocess communication mechanisms available under Linux. Message queues, shared memory and…
  • POSIX Threads Synchronization in CPOSIX Threads Synchronization in C 1.0 POSIX Threads Synchronization POSIX Threads provide multiple flows of execution within a process. The threads have their own stacks…
  • System V Shared Memory in LinuxSystem V Shared Memory in Linux 1.0 Shared Memory Shared memory is one of the three interprocess communication (IPC) mechanisms available under Linux and other Unix-like…