Issue
I have two CSV files, like the following:
file1.csv
label,"Part-A"
"ABC mn","2.0"
"XYZ","3.0"
"PQR SN","6"
file2.csv
label,"Part-B"
"XYZ","4.0"
"LMN Wv","8"
"PQR SN","6"
"EFG","1.0"
Desired Output.csv
label,"Part-A","Part-B"
"ABC mn","2.0",NA
"EFG",NA,"1.0"
"LMN Wv",NA,"8"
"PQR SN","6","6"
"XYZ","3.0","4.0"
Currently with the below awk command i am able to combine the matching one's which have entries for label in both the files like PQR and XYZ but unable to append the ones that are not having label values present in both the files:
awk -F, 'NR==FNR{a[$1]=substr($0,length($1)+2);next} ($1 in a){print $0","a[$1]}' file1.csv file2.csv
Solution
We suggest gawk
script which is standard Linux awk
:
script.awk
NR == FNR {
valsStr = sprintf("%s,%s", $2, "na");
rowsArr[$1] = valsStr;
}
NR != FNR && $1 in rowsArr {
split(rowsArr[$1],valsArr);
valsStr = sprintf("%s,%s", valsArr[1], $2);
rowsArr[$1] = valsStr;
next;
}
NR != FNR {
valsStr = sprintf("%s,%s", "na", $2);
rowsArr[$1] = valsStr;
}
END {
printf("%s,%s\n", "label", rowsArr["label"]);
for (rowName in rowsArr) {
if (rowName == "label") continue;
printf("%s,%s\n", rowName, rowsArr[rowName]);
}
}
output:
awk -F, -f script.awk input.{1,2}.txt
label,Part-A,Part-B
LMN,na,8
ABC,2,na
PQR,6,6
EFG,na,1
XYZ,3,4
Answered By - Dudi Boy Answer Checked By - Terry (WPSolving Volunteer)