Friday, April 15, 2022

[SOLVED] How to merge two files based on column as a key

Issue

How to merge two files based on column as a key, match to $1 column from a.txt and append b.txt based on $3 column

a.txt

aa; 2.5; 0.001;
ab; 1.5; 0.003;
ac; 0.4; 0.002;

b.txt

20-Nov-2014; 1775.00; aa;
20-Nov-2014; 1775.00; aa;
20-Nov-2014; 1463.40; ab;
20-Nov-2014; 1463.40; ac;
20-Nov-2014; 1463.40; ab;

Desired output look like this

20-Nov-2014; 1775.00; aa; aa; 2.5; 0.001;
20-Nov-2014; 1775.00; aa; aa; 2.5; 0.001;
20-Nov-2014; 1463.40; ab; ab; 1.5; 0.003;
20-Nov-2014; 1463.40; ac; ac; 0.4; 0.002;
20-Nov-2014; 1463.40; ab; ab; 1.5; 0.003;

Solution

$ awk -F';' 'FNR==NR{a[$1]=$0;next;} {print $0" " a[substr($3,2)];}' a.txt b.txt
20-Nov-2014; 1775.00; aa; aa; 2.5; 0.001;
20-Nov-2014; 1775.00; aa; aa; 2.5; 0.001;
20-Nov-2014; 1463.40; ab; ab; 1.5; 0.003;
20-Nov-2014; 1463.40; ac; ac; 0.4; 0.002;
20-Nov-2014; 1463.40; ab; ab; 1.5; 0.003;

How it works

awk implicitly loops over every line in the files. Each line is divided into fields.

  • -F';'

    This tells awk to use the semicolon as the field separator.

  • FNR==NR{a[$1]=$0;next;}

    NR is the number of lines that have been read in so far and FNR is the number of lines that have been read in so far from the current file. Consequently, when FNR==NR, we are still reading the first file, a.txt. In that case, this sets assigns the whole line that was just read in, $0, to array a under the key $1, the third field.

    next tells awk to skip the rest of the commands below and jump to the next line and start over.

  • print $0" " a[substr($3,2)]

    If we get here, that means we are working on the second file b.txt. In that case, print each line of this file followed by the line from array a with the key matching the third field.

    In file b.txt, the third field starts with a space. When looking up this field in array a, that space is removed with the substr function.



Answered By - John1024
Answer Checked By - Timothy Miller (WPSolving Admin)