Issue
I have a vcf file. It looks like this: It has a vcf header followed by genotype information. I want to add chr to third column. Like now it looks like:
21 9825796 21_9825796_C_T_b37
I want to add chr in front of third column so now it should look like:
21 9825796 chr21_9825796_C_T_b37
awk '{if($0 !~ /^#/) print "chr"$3; else print $3}' chr21_annotate.vcf > chr21_annotate_38_impute.vcf
But I am not able to get the desired output. Can anyone help
Solution
GNU sed
solution, let file.txt
content be
# this is header
21 9825796 21_9825796_C_T_b37
21 9825796 21_9825796_C_T_b37
then
sed -e '/^#/n' -e 's/\([^[:space:]]*\)/chr\1/3' file.txt
output
# this is header
21 9825796 chr21_9825796_C_T_b37
21 9825796 chr21_9825796_C_T_b37
Explanation: I register two expressions using -e
. First means if line starts with #
then print it as is and go to next, second replace 3rd occurence of zero or more non-whitespace characters using that occurence prefixed by chr
. I use capturing group denoted by \(
and \)
so I could use its' content in replacement using \1
.
(tested in GNU sed 4.2.2)
Answered By - Daweo Answer Checked By - Marilyn (WPSolving Volunteer)