comm
The comm
command is a useful tool for comparing two sorted files line by line. It outputs three columns that help you quickly identify differences and similarities between the files. This comprehensive guide will explain the basic syntax, common options, practical examples, and advanced usage tips for the comm
command so that you can incorporate it into your text processing and scripting workflows.
Table of Contents
Introduction
The comm
command compares two files that are already sorted and displays the differences in three distinct columns:
Column 1: Lines unique to the first file.
Column 2: Lines unique to the second file.
Column 3: Lines common to both files.
This line-by-line comparison is especially useful for tasks such as:
Identifying differences between lists.
Finding common elements in two datasets.
Verifying changes between versions of sorted data.
Basic Syntax and How comm
Works
The general syntax of comm
is:
FILE1 and FILE2: These are the two files you want to compare.
Note: Both files must be sorted in the same way (typically in lexicographical order) for
comm
to work correctly.
By default, comm
displays all three columns. If you want to hide one or more of these columns, you can use the appropriate options.
Common Options and Parameters
Suppressing Output Columns
comm
provides options to suppress (not display) one or more of the output columns. This can be helpful when you’re interested in only a subset of the comparison:
-1
: Suppresses the first column (lines unique to FILE1).-2
: Suppresses the second column (lines unique to FILE2).-3
: Suppresses the third column (common lines between FILE1 and FILE2).
For example, to display only the lines common to both files, you can suppress columns 1 and 2:
Here, the combined option -12
(or equivalently -1 -2
) hides the first two columns, leaving only the common lines (column 3).
File Sorting Requirement
It is critical to note that the input files must be sorted. If the files are not sorted, comm
may produce incorrect or misleading results. If needed, you can sort the files before comparing them:
Practical Examples
Comparing Two Sorted Files
Assume you have two files, file1.txt
and file2.txt
, that are already sorted. To compare them:
This command produces output in three columns:
Column 1: Lines unique to
file1.txt
Column 2: Lines unique to
file2.txt
Column 3: Lines present in both files
Extracting Common Lines
To see only the lines that both files share, suppress the first and second columns:
Extracting Unique Lines from Each File
Lines only in the first file:
comm -23 file1.txt file2.txtHere,
-2
suppresses the second column (unique to file2) and-3
suppresses common lines.Lines only in the second file:
comm -13 file1.txt file2.txtHere,
-1
suppresses the first column (unique to file1) and-3
suppresses common lines.
Using comm
in a Pipeline
You can also integrate comm
into a pipeline. For example, if you have two unsorted files, you might sort them on the fly:
This technique uses process substitution to sort the files before passing them to comm
.
Advanced Usage and Tips
Sorting Files Before Comparison
If your files are not already sorted, always sort them before using comm
. For example, to compare two unsorted lists of names:
Integrating comm
with Other Tools
comm
works well in combination with other commands:
Diff Alternatives: While
diff
shows detailed changes between files,comm
provides a concise overview of common and unique lines.Data Processing: Use
comm
to compare outputs from commands likecut
orawk
to analyze data differences in a structured format.
For example, if you have two lists of usernames extracted from different systems:
This shows usernames common to both files.
Conclusion and Further Reading
The comm
command is a straightforward yet powerful tool for comparing sorted files. By outputting differences in three clear columns, it helps you quickly identify unique and common lines between files. Whether you’re comparing lists, verifying backups, or processing data in scripts, mastering comm
can simplify many common tasks.
Further Reading and Resources
Manual Page:
Access detailed documentation by typing:man commOnline Documentation:
Tutorials and Examples:
Look for additional examples and use cases on forums, blogs, and Q&A sites like Stack Overflow.
Experiment with comm
on your own data to explore its capabilities and integrate it into your workflow. Happy comparing!