Issue
I have two diferrent files which some rows are missing in one of the files. I want to make a new file including those non-common rows between two files. as and example, I have following files:
file1:
id1
id22
id3
id4
id43
id100
id433
file2:
id1
id2
id22
id3
id4
id8
id43
id100
id433
id21
I want to extract those rows which exist in file2 but do not in file1:
new file:
id2
id8
id21
any suggestion please?
Solution
Use the comm
utility (assumes bash
as the shell):
comm -13 <(sort file1) <(sort file2)
Note how the input must be sorted for this to work, so your delta will be sorted, too.
comm
uses an (interleaved) 3-column layout:
- column 1: lines only in file1
- column 2: lines only in file2
- column 3: lines in both files
-13
suppresses columns 1 and 2, which prints only the values exclusive to file2
.
Caveat: For lines to be recognized as common to both files they must match exactly - seemingly identical lines that differ in terms of whitespace (as is the case in the sample data in the question as of this writing, where file1
lines have a trailing space) will not match.
cat -et
is a command that visualizes line endings and control characters, which is helpful in diagnosing such problems.
For instance, cat -et file1
would output lines such as id1 $
, making it obvious that there's a trailing space at the end of the line (represented as $
).
If instead of cleaning up file1
you want to compare the files as-is, try:
comm -13 <(sed -E 's/ +$//' file1 | sort) <(sort file2)
A generalized solution that trims leading and trailing whitespace from the lines of both files:
comm -13 <(sed -E 's/^[[:blank:]]+|[[:blank:]]+$//g' file1 | sort) \
<(sed -E 's/^[[:blank:]]+|[[:blank:]]+$//g' file2 | sort)
Note: The above sed
commands require either GNU or BSD sed
.
Edit: I only wanted to change 1 character but 6 is the minimum... Delete this...
Answered By - mklement0