Issue
I was wondering if this could be possible:
I have two files:
file a:
100005282 C
100016196 G
100011755 C
100012890 G
100016339 C
100013563 C
100015603 G
100008436 G
100004906 C
and file b:
rs10904494 100004906 A C
rs11591988 100005282 C T
rs10904561 100008436 T G
rs7906287 100011755 A G
rs9419557 100012890 A G
rs9286070 100013563 T C
rs9419478 100015603 G C
rs11253562 100016196 G T
rs4881551 100016339 C A
Based on the numbers in $1 from file a and $2 from file b, comparing the letters in $2 in file a with the same numbers in file b, at the end the result must be like this:
rs10904494 100004906 A C
rs10904561 100008436 T G
rs7906287 100011755 A G
rs9419557 100012890 A G
rs9286070 100013563 T C
Showing only the results that dont match.
Can be possible do this with awk?
Solution
If you're having trouble with awk, perhaps using grep would be simpler, e.g.
cat file1.txt
100005282 C
100016196 G
100011755 C
100012890 G
100016339 C
100013563 C
100015603 G
100008436 G
100004906 C
cat file2.txt
rs10904494 100004906 A C
rs11591988 100005282 C T
rs10904561 100008436 T G
rs7906287 100011755 A G
rs9419557 100012890 A G
rs9286070 100013563 T C
rs9419478 100015603 G C
rs11253562 100016196 G T
rs4881551 100016339 C A
grep -vFwf file1.txt file2.txt
rs10904494 100004906 A C
rs10904561 100008436 T G
rs7906287 100011755 A G
rs9419557 100012890 A G
rs9286070 100013563 T C
Otherwise, this should work for your use-case:
awk -F'\t' 'NR==FNR {A[$1,$2]; next} !($2,$3) in A' file1.txt file2.txt
rs10904494 100004906 A C
rs10904561 100008436 T G
rs7906287 100011755 A G
rs9419557 100012890 A G
rs9286070 100013563 T C
Answered By - jared_mamrot Answer Checked By - Marie Seifert (WPSolving Admin)