Master the Art of Text File Comparison in the Linux Terminal

Master the Art of Text File Comparison in the Linux Terminal

Efficiently compare and highlight disparities between text files using the powerful diff command in Linux and macOS Explore useful commands, add color to output, provide contextual information, and ignore whitespace and case for accurate comparisons

Key Takeaways

The diff command is used to compare two files and display the differences between them, including changes, deletions, and additions.

The diff command provides line numbers and labels to indicate the specific type of difference (change, deletion, or addition) in its output.

There are several options that can be used with the diff command, including the ability to display a concise statement about the differences between files, view them side by side, ignore white space and case differences, and even provide additional context for the detected differences.

Need to see the differences between two revisions of a text file? Then diff is the command you need. We'll show you how to use diff on Linux and macOS, the easy way.

What is the diff Command?

The diff command is used to compare two files and generate a list of the discrepancies between them. Its purpose is to indicate the modifications required in the first file in order to align it with the second file. By keeping this in mind, it becomes simpler to comprehend the output provided by the diff command. Originally created to identify disparities in source code files, it was designed to produce an output that can be readily interpreted and utilized by other programs, such as the patch command. This tutorial will explore the most beneficial approaches for utilizing diff in a user-friendly manner.

To begin, let's analyze two files. The order in which the files are specified on the command line determines their designation as the 'first file' and the 'second file' for the purpose of the comparison. In the given example, alpha1 is designated as the first file, while alpha2 is considered as the second file. Both files consist of the phonetic alphabet, but alpha2 has undergone some additional editing, resulting in non-identical content.

To compare the files, enter the following command: diff [first file name] [second file name], and then press Enter.

diff alpha1 alpha2

Master the Art of Text File Comparison in the Linux Terminal

To analyze the output, we will follow this approach. Upon understanding what needs to be identified, it becomes relatively simple. Each variation is presented sequentially in a solitary column and is accompanied by a label. The label consists of numbers on either side of a letter, such as 4c4. The initial number denotes the line number in alpha1, whereas the second number pertains to the line number in alpha2. The middling letter signifies:

c: The line in the first file necessitates modification to align with the line in the second file.

d: The line in the first file must be deleted to match the second file.

a: Extra content must be added to the first file to make it match the second file.

The 4c4 in our example indicates that line four of alpha1 needs to be modified to match line four of alpha2. This is the initial discrepancy identified by the "diff" function.

Lines beginning with "<" pertain to the first file (e.g., alpha1), while lines starting with ">" correspond to the second file (alpha2). The line "< Delta" signifies that the word "Delta" is the content of line four in alpha1. Conversely, the line "> Dave" indicates that the word "Dave" is the content of line four in alpha2. In summary, we must substitute "Delta" with "Dave" on line four in alpha1 to ensure alignment between the two files.

The 12c12 indicates the next change. Following the same reasoning, it implies that line 12 in alpha1 contains the term Lima, whereas line 12 in alpha2 contains Linux.

The third change relates to a line that has been removed from alpha2. The label 21d20 can be interpreted as "line 21 should be deleted from the first file for both files to be in sync from line 20 onwards." The < Uniform line displays the content of the line that must be deleted from alpha1.

The fourth difference is designated as 26a26,28. This modification pertains to three additional lines that have been inserted into alpha2. Take note of the 26,28 in the label. A pair of line numbers separated by a comma signifies a range of line numbers. In this instance, the range spans from line 26 to line 28. The label should be understood as "at line 26 in the first file, add lines 26 to 28 from the second file." We are presented with the three lines in alpha2 that ought to be incorporated into alpha1. These lines consist of the words Quirk, Strange, and Charm.

Useful single-line Commands with diff

If you all you want to know is whether two files are the same, use the -s (report identical files) option.

diff -s alpha1 alpha3

Master the Art of Text File Comparison in the Linux Terminal

You can use the -q (brief) option to get an equally terse statement about two files being different.

diff -q alpha1 alpha2

Master the Art of Text File Comparison in the Linux Terminal

One thing to watch out for is that with two identical files the-q (brief) option completely clams up and doesn't report anything at all.

An Alternative View of diff

The -y option, also known as side by side, applies an alternative layout to illustrate the disparities between files. To ensure readability, it is recommended to employ the -W option, which allows you to specify the width and restrict the number of columns displayed. By doing so, you can avoid unwieldy line wrapping that hampers effective comprehension. In the given example, the diff command has been instructed to generate a side by side representation while confining the output to 70 columns in the files alpha1 and alpha2.

Master the Art of Text File Comparison in the Linux Terminal

On the left side, the first file on the command line, alpha1, is displayed, while on the right side, the second file on the command line, alpha2, is displayed. The lines from each file are presented together, with indicator characters indicating any changes, deletions, or additions made in alpha2.

|: A line that has been changed in the second file.

<: A line that has been deleted from the second file.

>: The second file includes a line that is not present in the first file. For a condensed side by side comparison of the file discrepancies, you can utilize the --suppress-common-lines option. This option compels diff to display only the modified, added, or deleted lines.

diff -y -W 70 --suppress-common-lines alpha1 alpha2

Master the Art of Text File Comparison in the Linux Terminal

Add a Splash of Color To diff Output

Another utility called colordiff adds color highlighting to the diff output. This makes it much easier to see which lines have differences.

To install this package on Ubuntu or any other Debian-based distribution, use apt-get. On other Linux distributions, use the respective package management tool provided by your distribution.

Use the following command to install colordiff:

sudo apt-get install colordiff

Use colordiff just as you would use diff.

Master the Art of Text File Comparison in the Linux Terminal

In fact, colordiff is a wrapper for diff, and diff does all the work behind the scenes. Because of that, all of the diff options will work with colordiff.

Master the Art of Text File Comparison in the Linux Terminal

Providing Some Context

In order to strike a balance between displaying all lines in the files on the screen and only listing the changed lines, we can utilize the contextual feature of diff. There are two methods to achieve this, both of which serve the same purpose – showing a few lines before and after each modified line. This allows you to understand the context and comprehend the changes made in the file.

The first method uses the -c (copied context) option.

colordiff -c alpha1 alpha2

Master the Art of Text File Comparison in the Linux Terminal

The header of the diff output contains the names of the two files and their respective modification times. The first file is denoted by asterisks (*) preceding its name, while the second file is denoted by dashes (-). These symbols serve as indicators to identify to which file the lines in the output correspond.

Master the Art of Text File Comparison in the Linux Terminal

must remain in the same position as the given fragment and should not be removed.

We can identify lines from alpha1 by a line of asterisks with 1,7 in the middle. Specifically, it refers to lines one to seven. The term Delta is marked as modified, indicated by an exclamation point ( ! ) next to it, and it appears in red. To provide context, there are three lines of unchanged text displayed before and after this modified line.

Furthermore, a line of dashes with 1,7 in the middle informs us that we have switched to examining lines from alpha2. Once again, we are focusing on lines one to seven, and the word Dave on line four is highlighted as a change.

Specify the number of lines of context for the diff output by using the -C (copied context) option with a capital "C" and providing the desired number of lines:

To provide two lines of context above and below each change, use the following command:

colordiff -C 2 alpha1 alpha2

Master the Art of Text File Comparison in the Linux Terminal

The second diff option that offers context is the -u (unified context) option.

colordiff -u alpha1 alpha2

Master the Art of Text File Comparison in the Linux Terminal

The output includes a header that displays the names of the two files and their modification times. The name of alpha1 is preceded by dashes (-) and the name of alpha2 is preceded by plus signs (+). Lines starting with at signs (@) indicate the start of each difference and denote which lines are being shown from each file.

To provide context for the changed lines, we are shown the three lines before and after the flagged line. In the unified view, the lines with differences are displayed one above the other. The line from alpha1 has a dash before it, while the line from alpha2 has a plus sign before it. This streamlined display achieves in eight lines what the previously mentioned context display took fifteen lines to do.

To specify the desired number of lines for unified context, use the -U option (with a capital "U") followed by the desired line count:

colordiff -U 2 alpha1 alpha2

Master the Art of Text File Comparison in the Linux Terminal

Ignoring White Space and Case

Let's analyze another two files, test4 and test5. These have the names six of superheroes in them.

colordiff -y -W 70 test4 test5

Master the Art of Text File Comparison in the Linux Terminal

The findings indicate that there are no differences observed in the Black Widow, Spider-Man, and Thor lines when compared using diff. However, it does identify alterations in the Captain America, Ironman, and The Hulk lines.

So what's different? In test5, "Hulk" is spelled with a lowercase "h," and there is an extra space between "Captain" and "America" for Captain America. However, the Ironman line appears to be identical. However, it is likely that there are hidden differences, such as stray spaces or tab characters at the end of the line.

If these differences are not important to you, you can instruct diff to ignore specific types of line differences, including:

-i: Ignore differences in case.

-Z: Ignore trailing white space.

-b: Ignore changes in the amount of white space.

-w: Ignore all white space changes.

Let's ask diff to check those two files again, but this time to ignore any differences in case.

colordiff -i -y -W 70 test4 test5

Master the Art of Text File Comparison in the Linux Terminal

The lines with "The Hulk" and "The hulk" are now considered a match, and no difference is flagged for lowercase "h." Let's ask diff to also ignore trailing white space.

colordiff -i -Z -y -W 70 test4 test5

Master the Art of Text File Comparison in the Linux Terminal

As suspected, the reason for the difference on the Ironman line is most likely trailing white space. This can be confirmed as diff no longer detects a difference for that line. Now, the only remaining difference is in the Captain America line. To resolve this, we can instruct diff to ignore case and disregard any white space issues.

Please use the following command to achieve this:

colordiff -i -w -y -W 70 test4 test5

Master the Art of Text File Comparison in the Linux Terminal

By telling diff to ignore the differences that we're not concerned about, diff tells us that, for our purposes, the files match.

The majority of options in the diff command are focused on generating machine-readable output. You can find a detailed list of these options on the Linux man page. The examples mentioned above utilize the options that allow you to easily identify all the variations between different versions of your text files using the command line and your own visual examination.

Linux Commands

Files

tar · pv · cat · tac · chmod · grep · diff · sed · ar · man · pushd · popd · fsck · testdisk · seq · fd · pandoc · cd · $PATH · awk · join · jq · fold · uniq · journalctl · tail · stat · ls · fstab · echo · less · chgrp · chown · rev · look · strings · type · rename · zip · unzip · mount · umount · install · fdisk · mkfs · rm · rmdir · rsync · df · gpg · vi · nano · mkdir · du · ln · patch · convert · rclone · shred · srm · scp · gzip · chattr · cut · find · umask · wc · tr

Processes

alias · screen · top · nice · renice · progress · strace · systemd · tmux · chsh · history · at · batch · free · which · dmesg · chfn · usermod · ps · chroot · xargs · tty · pinky · lsof · vmstat · timeout · wall · yes · kill · sleep · sudo · su · time · groupadd · usermod · groups · lshw · shutdown · reboot · halt · poweroff · passwd · lscpu · crontab · date · bg · fg · pidof · nohup · pmap

Networking

netstat · ping · traceroute · ip · ss · whois · fail2ban · bmon · dig · finger · nmap · ftp · curl · wget · who · whoami · w · iptables · ssh-keygen · ufw · arping · firewalld