comm
The comm command is a useful tool for comparing two sorted files line by line. It outputs three columns that help you quickly identify differences and similarities between the files. This comprehensive guide will explain the basic syntax, common options, practical examples, and advanced usage tips for the comm command so that you can incorporate it into your text processing and scripting workflows.
Table of Contents
Introduction
The comm command compares two files that are already sorted and displays the differences in three distinct columns:
Column 1: Lines unique to the first file.
Column 2: Lines unique to the second file.
Column 3: Lines common to both files.
This line-by-line comparison is especially useful for tasks such as:
Identifying differences between lists.
Finding common elements in two datasets.
Verifying changes between versions of sorted data.
Basic Syntax and How comm Works
The general syntax of comm is:
FILE1 and FILE2: These are the two files you want to compare.
Note: Both files must be sorted in the same way (typically in lexicographical order) for
commto work correctly.
By default, comm displays all three columns. If you want to hide one or more of these columns, you can use the appropriate options.
Common Options and Parameters
Suppressing Output Columns
comm provides options to suppress (not display) one or more of the output columns. This can be helpful when you’re interested in only a subset of the comparison:
-1: Suppresses the first column (lines unique to FILE1).-2: Suppresses the second column (lines unique to FILE2).-3: Suppresses the third column (common lines between FILE1 and FILE2).
For example, to display only the lines common to both files, you can suppress columns 1 and 2:
Here, the combined option -12 (or equivalently -1 -2) hides the first two columns, leaving only the common lines (column 3).
File Sorting Requirement
It is critical to note that the input files must be sorted. If the files are not sorted, comm may produce incorrect or misleading results. If needed, you can sort the files before comparing them:
Practical Examples
Comparing Two Sorted Files
Assume you have two files, file1.txt and file2.txt, that are already sorted. To compare them:
This command produces output in three columns:
Column 1: Lines unique to
file1.txtColumn 2: Lines unique to
file2.txtColumn 3: Lines present in both files
Extracting Common Lines
To see only the lines that both files share, suppress the first and second columns:
Extracting Unique Lines from Each File
Lines only in the first file:
comm -23 file1.txt file2.txtHere,
-2suppresses the second column (unique to file2) and-3suppresses common lines.Lines only in the second file:
comm -13 file1.txt file2.txtHere,
-1suppresses the first column (unique to file1) and-3suppresses common lines.
Using comm in a Pipeline
You can also integrate comm into a pipeline. For example, if you have two unsorted files, you might sort them on the fly:
This technique uses process substitution to sort the files before passing them to comm.
Advanced Usage and Tips
Sorting Files Before Comparison
If your files are not already sorted, always sort them before using comm. For example, to compare two unsorted lists of names:
Integrating comm with Other Tools
comm works well in combination with other commands:
Diff Alternatives: While
diffshows detailed changes between files,commprovides a concise overview of common and unique lines.Data Processing: Use
commto compare outputs from commands likecutorawkto analyze data differences in a structured format.
For example, if you have two lists of usernames extracted from different systems:
This shows usernames common to both files.
Conclusion and Further Reading
The comm command is a straightforward yet powerful tool for comparing sorted files. By outputting differences in three clear columns, it helps you quickly identify unique and common lines between files. Whether you’re comparing lists, verifying backups, or processing data in scripts, mastering comm can simplify many common tasks.
Further Reading and Resources
Manual Page:
Access detailed documentation by typing:man commOnline Documentation:
Tutorials and Examples:
Look for additional examples and use cases on forums, blogs, and Q&A sites like Stack Overflow.
Experiment with comm on your own data to explore its capabilities and integrate it into your workflow. Happy comparing!