Issue
I am new to awk and would like to replace IID in file 1 with regards to the conversion in file 2. Both of these files are txt files.
I have a file (file 1) that looks like this (just showing first two columns) and has 2060 rows:
FID IID
1 RQ00001-2
2 RQ00002-0
3 RQ00004-9
4 RQ00005-4
5 RQ00006-5
I have another file showing the conversion of the IIDs to another format. This file looks like this:
id Id
468768 RQ00001-2
468769 RQ00006-5
468770 RQ00005-4
468771 RQ00002-0
468772 RQ00004-9
So I want to replace the IID in file 1 with id using file 2 as the conversion. so file 1 should look like this:
FID IID
1 468768
2 468771
3 468772
4 468770
5 468769
So I essentially want to replace IID with id in file 2 using file 2 as the conversion.
I know I can do this with awk but I am not sure how. Any help would be appreciated.
Solution
NOTE: OP's original question included:
awk 'FNR==NR{a[$1]=$2;next} {print $1,$1 in a?a[$1]:$2}' OFS="\t" Input_file2 Input_file1
But I have no idea what this means ^ and I don't think it's applicable for my problem.
$1 in a ? a[$1] : $2
is the awk ternary operator.
In this case it reads: if $1 in a
then output a[$1]
else output $2
For this particular case it's saying that if $1
(first field of 2nd file) is an index in the a[]
array then print the contents of the array entry a[$1]
else print the contents of $2
(second field of the 2nd file); in other words the ternary operator is determining if you keep the current field #2 value or replace it with a cooresponding value from the array.
Having said that I think there's a problem with the current awk
code ...
Assumptions:
- if
file #1 / field #2
matchesfile #2 / field #2
then ... - replace
file #1 / field #2
with the value fromfile #2 / field #1
One idea for an awk
modification:
awk -v OFS="\t" ' # just my personal preference to list variables first; OP can leave after script and before files
# process 1st input file
FNR==NR { if ( FNR>1 ) # skip 1st line "id Id"
a[$2]=$1 # 2nd file: 2nd field is index, 1st field is replacement value
next
}
# process 2nd input file
{ print $1,$2 in a ? a[$2] : $2 } # if 2nd field is an index in array a[] then replace it with said array value else keep current 2nd field
' Input_file2 Input_file1
# eliminating comments and pesky white space for a 'compact' one-liner:
awk -v OFS="\t" 'FNR==NR {if(FNR>1)a[$2]=$1;next}{print $1,$2 in a?a[$2]:$2}' Input_file2 Input_file1
Both of these generate:
FID IID
1 468768
2 468771
3 468772
4 468770
5 468769
NOTES:
- OP has mentioned wanting to replace the values in file #1; OP will need to capture the output from this
awk
script into another (temporary) file and then overwrite the original file with this new (temporary) file; it's up to OP to determine if a backup copy of file #1 should be made first - OP mentions file #1 has more than 2 columns; assuming the number of columns could be 'large' and/or dynamic, OP can make the following change to the code ...
Modifying code to replace file #1 / field #2
and then print the line:
# change from:
{ print $1,$2 in a ? a[$2] : $2 }
# change to:
{ $2 = $2 in a ? a[$2] : $2; print } # overwrite value in field #2 in current line and then print current line
Answered By - markp-fuso Answer Checked By - Marilyn (WPSolving Volunteer)