Issue
I have file 1
C2
H1
H2
H3
H4
L1
L10
L2
L3
L4
/C2/C2_CRRA200017850-1a_H3LJWDSXY_L1
/H1/H1_CRRA200017885-1a_H3LJWDSXY_L1
/H1/H1_CRRA200017885-1a_H5MLCDSXY_L1
/H2/H2_CRRA200017886-1a_H3LJWDSXY_L1
/H3/H3_CRRA200017887-1a_H3LJWDSXY_L1
/H4/H4_CRRA200017888-1a_H3LJWDSXY_L1
/L1/L1_CRRA200017897-1a_H3LJWDSXY_L1
/L2/L2_CRRA200017898-1a_H3LJWDSXY_L1
/L3/L3_CRRA200017899-1a_H3LJWDSXY_L1
/L4/L4_CRRA200017900-1a_H3LJWDSXY_L1
/L4/L4_CRRA200017900-1a_H5MLCDSXY_L1
I need to produce an output where, If multiple lines in file 2 match strings in file 1 then concatenate them, otherwise print single line.
/C2/C2_CRRA200017850-1a_H3LJWDSXY_L1
/H1/H1_CRRA200017885-1a_H3LJWDSXY_L1 /H1/H1_CRRA200017885-1a_H5MLCDSXY_L1
/H2/H2_CRRA200017886-1a_H3LJWDSXY_L1
/H3/H3_CRRA200017887-1a_H3LJWDSXY_L1
/H4/H4_CRRA200017888-1a_H3LJWDSXY_L1
/L1/L1_CRRA200017897-1a_H3LJWDSXY_L1
/L2/L2_CRRA200017898-1a_H3LJWDSXY_L1
/L3/L3_CRRA200017899-1a_H3LJWDSXY_L1
/L4/L4_CRRA200017900-1a_H3LJWDSXY_L1 /L4/L4_CRRA200017900-1a_H5MLCDSXY_L1
Another way of thinking is if lines in file 2 have the same string at // then concatenate them, otherwise print single line.
Have tried several grep cmd but none worked.
Solution
You might first store all the values of file1 in an array ary
.
Then create a mapper where the key between the first //
is present in ary, and concat for the value the lines separated by a space.
The count will increate by 1 for every new entry in mapper, and is used when printing in the resulting array to keep the order.
In the END block you can print the values of the result array.
awk -v FS="/" '
FNR==NR{
ary[$0];next
}
{
if($2 in ary){
if(mapper[$2]) {
mapper[$2] = mapper[$2] " " $0
next
}
mapper[$2] = $0;
result[++count] = $2
next
}
result[++count] = $0
}
END {
for (line = 1; line <= count; line++) {
print result[line] in mapper ? mapper[result[line]] : result[line]
}
}
' file1 file2
Example file2
/L1/L1_CRRA200017897-1a_H3LJWDSXY_L1
/C2/C2_CRRA200017850-1a_H3LJWDSXY_L1
/H1/H1_CRRA200017885-1a_H3LJWDSXY_L1
/H1/H1_CRRA200017885-1a_H5MLCDSXY_L1
/H2/H2_CRRA200017886-1a_H3LJWDSXY_L1
/H3/H3_CRRA200017887-1a_H3LJWDSXY_L1
/H4/H4_CRRA200017888-1a_H3LJWDSXY_L1
/L1/L1_CRRA200017897-1a_H3LJWDSXY_L1
/L2/L2_CRRA200017898-1a_H3LJWDSXY_L1
/L3/L3_CRRA200017899-1a_H3LJWDSXY_L1
/L4/L4_CRRA200017900-1a_H3LJWDSXY_L1
/L4/L4_CRRA200017900-1a_H5MLCDSXY_L1
/X9/X9_CRRA200017900-1c_H5MLCDSXY_L1
/X9/X9_CRRA200017900-1c_H5MLCDSXY_L1
/L4/L4_CRRA200017900-1a_H5MLCDSXY_L1
/X8/X8_CRRA200017900-1b_H5MLCDSXY_L1
/X9/X9_CRRA200017900-1c_H5MLCDSXY_L1
Output
/L1/L1_CRRA200017897-1a_H3LJWDSXY_L1 /L1/L1_CRRA200017897-1a_H3LJWDSXY_L1
/C2/C2_CRRA200017850-1a_H3LJWDSXY_L1
/H1/H1_CRRA200017885-1a_H3LJWDSXY_L1 /H1/H1_CRRA200017885-1a_H5MLCDSXY_L1
/H2/H2_CRRA200017886-1a_H3LJWDSXY_L1
/H3/H3_CRRA200017887-1a_H3LJWDSXY_L1
/H4/H4_CRRA200017888-1a_H3LJWDSXY_L1
/L2/L2_CRRA200017898-1a_H3LJWDSXY_L1
/L3/L3_CRRA200017899-1a_H3LJWDSXY_L1
/L4/L4_CRRA200017900-1a_H3LJWDSXY_L1 /L4/L4_CRRA200017900-1a_H5MLCDSXY_L1 /L4/L4_CRRA200017900-1a_H5MLCDSXY_L1
/X9/X9_CRRA200017900-1c_H5MLCDSXY_L1
/X9/X9_CRRA200017900-1c_H5MLCDSXY_L1
/X8/X8_CRRA200017900-1b_H5MLCDSXY_L1
/X9/X9_CRRA200017900-1c_H5MLCDSXY_L1
Answered By - The fourth bird Answer Checked By - Robin (WPSolving Admin)