If you're working mostly on command line and dealing with a lot of text files every day, you should be aware of Uniq command. This command helps you to find repeated/duplicate lines from a file easily. It is not just for finding duplicates, but also to remove the duplicates, display the number of occurrences of the duplicate lines, display only the repeated lines and display only the unique lines etc. Since the uniq command is part of GNU coreutils package, it comes preinstalled in most Linux distributions. So, let us not bother with installation and see some practical examples.
Please note that the 'uniq' command will not detect repeated lines unless they are adjacent. So, you might need to sort them first or combine the sort command with uniq to get the results. Allow me to show you some examples.
First, let us create a file with some duplicate lines.
$ vi ostechnix.txt
welcome to ostechnix welcome to ostechnix Linus is the creator of Linux. Linux is secure by default Linus is the creator of Linux. Top 500 super computers are powered by Linux
As you see in the above file, we have few repeated lines (the first, second, third, and fifth lines are duplicates).
1. Remove consecutive duplicate lines in a file using Uniq command
If you use 'uniq' command without any arguments, it will remove all consecutive duplicate lines and display only the unique lines.
$ uniq ostechnix.txt
Sample output would be:
As you can see, uniq command removed all consecutive duplicate lines in the given file. You might also have noticed that the above output still has the duplicates in second and fourth lines. It is because the uniq command will only omit the repeated lines only if they are adjacent. We can, of course, remove that non-consecutive duplicates too. Look at the second example below.
2. Remove all duplicate lines
$ sort ostechnix.txt | uniq
Sample output would be:
See? There are no duplicates or repeated lines. In other words, the above command will display each line once from file ostechnix.txt. We used the sort command in conjunction with uniq, because, as I already mentioned, uniq will not find the duplicate/repeated lines unless they are adjacent.
3. Display only unique lines from a file
To display only the unique lines from a file, the command would be:
$ sort ostechnix.txt | uniq -u
Linux is secure by default Top 500 super computers are powered by Linux
As you can see, we have only two unique lines in the given file.
4. Display only duplicate lines
Similarly, we can also display duplicates lines from a file like below.
$ sort ostechnix.txt | uniq -d
Linus is the creator of Linux. welcome to ostechnix
These two are the repeated/duplicated lines in ostechnix.txt file. Please note that -d (small d) will only print duplicate lines, one for each group. To print all duplicate lines, use -D (capital d) like below.
$ sort ostechnix.txt | uniq -D
See the difference between both flags in the below screenshot.
5. Display number of occurrences of each line in a file
For some reason, you might want to check how many times a line is repeated in the given file. To do so, use -c flag like below.
$ sort ostechnix.txt | uniq -c
2 Linus is the creator of Linux. 1 Linux is secure by default 1 Top 500 super computers are powered by Linux 2 welcome to ostechnix
We can also display number of occurrences of each line along with that line, sorted by the most frequent using command:
$ sort ostechnix.txt | uniq -c | sort -nr
2 welcome to ostechnix 2 Linus is the creator of Linux. 1 Top 500 super computers are powered by Linux 1 Linux is secure by default
6. Limit the comparison to 'N' characters
Uniq command allows us to limit the comparison to a particular number of characters of lines in a file using -w flag. For example, let us limit the comparison to first 4 characters of lines in a file and display the repeated lines as shown below.
$ uniq -d -w 4 ostechnix.txt
7. Avoid the comparison with the first 'N' characters
Like limit comparison to N characters of lines in a file, we can also avoid comparing the first N characters using -s flag.
The following command will avoid the comparison with the first 4 characters of lines in a file:
$ uniq -d -s 4 ostechnix.txt
To avoid comparing the first N fields instead of characters, use '-f' flag in the above command.
For more details, refer the help section;
$ uniq --help
and man pages.
$ man uniq