Monday, June 27, 2022

[SOLVED] Replace variables based upon a value in another file

June 27, 2022 awk, bash, grep

Issue

I am new to awk and would like to replace IID in file 1 with regards to the conversion in file 2. Both of these files are txt files.

I have a file (file 1) that looks like this (just showing first two columns) and has 2060 rows:

FID IID
1   RQ00001-2
2   RQ00002-0
3   RQ00004-9
4   RQ00005-4   
5   RQ00006-5

I have another file showing the conversion of the IIDs to another format. This file looks like this:

id Id
468768 RQ00001-2
468769 RQ00006-5
468770 RQ00005-4
468771 RQ00002-0
468772 RQ00004-9

So I want to replace the IID in file 1 with id using file 2 as the conversion. so file 1 should look like this:

So I essentially want to replace IID with id in file 2 using file 2 as the conversion.

I know I can do this with awk but I am not sure how. Any help would be appreciated.

Solution

NOTE: OP's original question included:

awk 'FNR==NR{a[$1]=$2;next} {print $1,$1 in a?a[$1]:$2}' OFS="\t" Input_file2 Input_file1
But I have no idea what this means    ^ and I don't think it's applicable for my problem.

$1 in a ? a[$1] : $2 is the awk ternary operator.

In this case it reads: if $1 in a then output a[$1] else output $2

For this particular case it's saying that if $1 (first field of 2nd file) is an index in the a[] array then print the contents of the array entry a[$1] else print the contents of $2 (second field of the 2nd file); in other words the ternary operator is determining if you keep the current field #2 value or replace it with a cooresponding value from the array.

Having said that I think there's a problem with the current awk code ...

Assumptions:

if file #1 / field #2 matches file #2 / field #2 then ...
replace file #1 / field #2 with the value from file #2 / field #1

One idea for an awk modification:

awk -v OFS="\t" '                          # just my personal preference to list variables first; OP can leave after script and before files

        # process 1st input file

FNR==NR { if ( FNR>1 )                     # skip 1st line "id Id"
             a[$2]=$1                      # 2nd file: 2nd field is index, 1st field is replacement value
          next
        }

        # process 2nd input file

        { print $1,$2 in a ? a[$2] : $2 }  # if 2nd field is an index in array a[] then replace it with said array value else keep current 2nd field 
' Input_file2 Input_file1

 # eliminating comments and pesky white space for a 'compact' one-liner:

 awk -v OFS="\t" 'FNR==NR {if(FNR>1)a[$2]=$1;next}{print $1,$2 in a?a[$2]:$2}' Input_file2 Input_file1

Both of these generate:

FID     IID
1       468768
2       468771
3       468772
4       468770
5       468769

NOTES:

OP has mentioned wanting to replace the values in file #1; OP will need to capture the output from this awk script into another (temporary) file and then overwrite the original file with this new (temporary) file; it's up to OP to determine if a backup copy of file #1 should be made first
OP mentions file #1 has more than 2 columns; assuming the number of columns could be 'large' and/or dynamic, OP can make the following change to the code ...

Modifying code to replace file #1 / field #2 and then print the line:

# change from:

{ print $1,$2 in a ? a[$2] : $2 }

# change to:

{ $2 = $2 in a ? a[$2] : $2; print }   # overwrite value in field #2 in current line and then print current line

Answered By - markp-fuso

Answer Checked By - Marilyn (WPSolving Volunteer)

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Monday, June 27, 2022

[SOLVED] Replace variables based upon a value in another file

Issue

Solution

Popular Posts

Labels