Sunday, June 5, 2022

[SOLVED] edit the ID column to add chr to a number

Issue

I have a vcf file. It looks like this: It has a vcf header followed by genotype information. I want to add chr to third column. Like now it looks like:

21 9825796 21_9825796_C_T_b37

I want to add chr in front of third column so now it should look like:

21 9825796 chr21_9825796_C_T_b37

enter image description here I used this command:

awk '{if($0 !~ /^#/) print "chr"$3; else print $3}' chr21_annotate.vcf > chr21_annotate_38_impute.vcf

But I am not able to get the desired output. Can anyone help


Solution

GNU sed solution, let file.txt content be

# this is header
21 9825796 21_9825796_C_T_b37
21 9825796 21_9825796_C_T_b37

then

sed -e '/^#/n' -e 's/\([^[:space:]]*\)/chr\1/3' file.txt

output

# this is header
21 9825796 chr21_9825796_C_T_b37
21 9825796 chr21_9825796_C_T_b37

Explanation: I register two expressions using -e. First means if line starts with # then print it as is and go to next, second replace 3rd occurence of zero or more non-whitespace characters using that occurence prefixed by chr. I use capturing group denoted by \( and \) so I could use its' content in replacement using \1.

(tested in GNU sed 4.2.2)



Answered By - Daweo
Answer Checked By - Marilyn (WPSolving Volunteer)