Thursday, February 17, 2022

[SOLVED] Use awk with two files as a filter

Issue

I was wondering if this could be possible:

I have two files:

file a:

100005282       C
100016196       G
100011755       C
100012890       G
100016339       C
100013563       C
100015603       G
100008436       G
100004906       C

and file b:

rs10904494    100004906 A C
rs11591988    100005282 C T
rs10904561    100008436 T G
rs7906287    100011755 A G
rs9419557    100012890 A G
rs9286070    100013563 T C
rs9419478    100015603 G C
rs11253562   100016196 G T
rs4881551    100016339 C A

Based on the numbers in $1 from file a and $2 from file b, comparing the letters in $2 in file a with the same numbers in file b, at the end the result must be like this:

rs10904494    100004906 A C
rs10904561    100008436 T G
rs7906287    100011755 A G
rs9419557    100012890 A G
rs9286070    100013563 T C

Showing only the results that dont match.

Can be possible do this with awk?


Solution

If you're having trouble with awk, perhaps using grep would be simpler, e.g.

cat file1.txt
100005282   C
100016196   G
100011755   C
100012890   G
100016339   C
100013563   C
100015603   G
100008436   G
100004906   C

cat file2.txt
rs10904494  100004906   A   C
rs11591988  100005282   C   T
rs10904561  100008436   T   G
rs7906287   100011755   A   G
rs9419557   100012890   A   G
rs9286070   100013563   T   C
rs9419478   100015603   G   C
rs11253562  100016196   G   T
rs4881551   100016339   C   A

grep -vFwf file1.txt file2.txt
rs10904494  100004906   A   C
rs10904561  100008436   T   G
rs7906287   100011755   A   G
rs9419557   100012890   A   G
rs9286070   100013563   T   C

Otherwise, this should work for your use-case:

awk -F'\t' 'NR==FNR {A[$1,$2]; next} !($2,$3) in A' file1.txt file2.txt
rs10904494  100004906   A   C
rs10904561  100008436   T   G
rs7906287   100011755   A   G
rs9419557   100012890   A   G
rs9286070   100013563   T   C


Answered By - jared_mamrot
Answer Checked By - Marie Seifert (WPSolving Admin)