Issue
I have multiple alignment format (MAF) files that look like this:
##maf version=1
a score=-1274
s Chr10 34972197 2927 + 190919061 AACCTTGGGG
s Chr11 36777315 2442 + 244384623 AACCTTGGGG
a score=-60687
s Chr1 81897274 61972 + 159217232 CGTTTTCCCGG
s Chr1 33997294 32248 + 200980605
I would like to modify the second column of these files for lines that start with "s", to have something like this:
##maf version=1
a score=-1274
s species1.Chr10 34972197 2927 + 190919061 AACCTTGGGG
s species2.Chr11 36777315 2442 + 244384623 AACCTTGGGG
a score=-60687
s species1.Chr1 81897274 61972 + 159217232 CGTTTTCCCGG
s species2.Chr1 33997294 32248 + 200980605 CGTTTTCCCGG
Using this idea https://unix.stackexchange.com/questions/154220/adding-a-character-to-every-other-text-line
I am trying things like this:
awk '$1 == "s" {print ((NR%2)? "species1.":"") $0}'
But I am still far to reach my objective. Do you know how I could achieve this?
Solution
Assumptions:
- distance between fields is to be maintained
One awk
idea:
awk '
!/^s/ { print; sfx=0 } # if line does not start with "^s" then print line and reset sfx variable
/^s/ { n=split($0,a,FS,seps) # if line starts with "^s" then split current line; key is to save each separator as a separate seps[] array entry
a[2]="species" ++sfx "." a[2] # add prefix to value in 2nd field
for (i=1;i<=n;i++) # loop through all field/separator pairs
printf a[i] seps[i] # print each field/separator
print "" # terminate line
}
' maf.dat
NOTE: requires GNU awk
for 4th argument to split()
This generates:
##maf version=1
a score=-1274
s species1.Chr10 34972197 2927 + 190919061 AACCTTGGGG
s species2.Chr11 36777315 2442 + 244384623 AACCTTGGGG
a score=-60687
s species1.Chr1 81897274 61972 + 159217232 CGTTTTCCCGG
s species2.Chr1 33997294 32248 + 200980605 CGTTTTCCCGG
Answered By - markp-fuso Answer Checked By - Terry (WPSolving Volunteer)