Saturday, February 5, 2022

[SOLVED] Comparing two files in BASH line by line

Issue

I need to make a script file that reads two files and prints out common lines between them. I know that both the files are the same number of lines and each line only contains one word.

File 1:

Blue
Red
Orange
Green
Yellow
Blue

File 2:

Blue
Green
Red
Purple
Yellow
Blue

Expected output:

Blue
Yellow
Blue

So in the example Red and Green appear in both files, however they are not on the same line in each file so they are ignored.

Have tried using awk, grep and comm but couldn't get them to work.

Trying to find the solution that takes the shortest amount of time to process.


Solution

Using awk:

awk 'NR == FNR { lines[NR] = $0 } NR != FNR && lines[FNR] == $0 { print }' file1 file2

Explanation:

  • When reading the first file (NR == FNR), build a mapping of line number to value
  • When reading not the first file (NR != FNR), if the current line matches what the corresponding line has in the cache, then print the line

This reads both files exactly once, and uses roughly as much memory as the size of the first file.



Answered By - janos
Answer Checked By - Timothy Miller (WPSolving Admin)