Issue
I have this 5-columns file :
m64071_220512_054244/46858502/ccs TCTACACGACGCTCTTCCGATCTTATTGGGCACGGTGTCGCCATCTGATCGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTCGAGGTTTGCAGCTATTTTATTTACAAGTATACATTTAACACAATGAAATAAACACTGATATACTGAAGCCTAGTTAATAGTAGTGTAACAATATGCATCATTTTGATGATTACATTATTTTAAACAACAAACTACACTGAAAAATTAATGCCGATAAAATTCTTGGTCATAATATTAAGAAATACAATATATAAATTGAAAATATGATTGCTTAAAATTTGAAAATGGAAGTGAACTCATTTGGACAGACTCAGAGTTAACATAATCTGAAGGGAGGGGAGCTCTGACCCAAATGATATCTTTCAGGTTAACAGAAGAAAAAAGAAGCATAGTTTATCTTCAAGGAGAACGGGCAGTTTGCTTCTTCAGGTA fwd pet047-9952 TATTGGGCACGGTGTC
m64071_220512_054244/52233509/ccs AGCTTTTTTGGAATCTTCTGCTAAAGAAAATCAGACTGCTGTGGATGTTTTTCGAAGGATAATTTTGGAGGCAGAAAAAATGGACGGGGCAGCTTCACAAGGCAAGTCTTCATGCTCGGTGATGTGATTCTGCTGCAAAGCCTGAGGACACTGGGAATATATTCTACCTGAAGAAGCAAACTGCCCGTTCTCCTTGAAGATAAACTATGCTTCTTTTTTCTTCTGTTAACCTGAAAGATATCATTTGGGTCAGAGCTCCCCTCCCTTCAGATTATGTTAACTCTGAGTCTGTCCAAATGAGTTCACTTCCATTTTCAAATTTTAAGCAATCATATTTTCAATTTATATATTGTATTTCTTAATATTATGACCAAGAATTTTATCGGCATTAATTTTTCAGTGTAGTTTGTTGTTTAAAATAATGTAATCATCAAAATGATGCATATTGTTACACTACTATTAACTAGGCTTCAGTATATCAGTGTTTATTTCATTGTGTTAAATGTATACTTGTAAATAAAATAGCTGCAAACCTCGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACGATCAGATGGCGACACCGTGCCCAATAAGATCGGAAGAGCGTCGTGTAGA rev pet047-9952 GACACCGTGCCCAATA
m64071_220512_054244/91226755/ccs TCTACACGACGCTCTTCCGATCTTATTGGGCACGGTGTCGCCATCTGATCGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTCGAGGTTTGCAGCTATTTTATTTACAAGTATACATTTAACACAATGAAATAAACACTGATATACTGAAGCCTAGTTAATAGTAGTGTAACAATATGCATCATTTTGATGATTACATTATTTTAAACAACAAACTACACTGAAAAATTAATGCCGATAAAATTCTTGGTCATAATATTAAGAAATACAATATATAAATTGAAAATATGATTGCTTAAAATTTGAAAATGGAAGTGAACTCATTTGGACAGACTCAGAGTTAACATAATCTGAAGGGAGGGGAGCTCTGACCCAAATGATATCTTTCAGGTTAACAGAAGAAAAAAGAAGCATAGTTTATCTTCAAGGAGAACGGGCAGTTTGCTTCTTCAGGTAGAATATATTCCCAGTGTCCTCAGGCTTTGCAGCAGAATCACATCACCGAGCATGAAGACTTGCCTTGTGAAGCTGCCCCGTCCATTTTTTCTGCCTCCAA fwd pet047-9952 TATTGGGCACGGTGTC
For each line, I need to grep the last column value $5 in the second field $2.
Then, I need to print the same line with an additional $6 column with the grep result with a condition : if ($3 == rev)
, the $6 is grep result + 12 characters after grep or if ($3 == fwd)
the grep result + 12 characters before grep.
awk '$2~/$5/ {match($0, /$5/); if ($4=="rev") print substr($0, RSTART +12, RLENGTH + 12); else print substr($0, RSTART + 0, RLENGTH + 12) ;}' file
$5 values are necessary 16 characters and the pattern I look for is always 12 characters. Then, my $6 output is 28 characters.
Expected output :
m64071_220512_054244/46858502/ccs TCTACACGACGCTCTTCCGATCTTATTGGGCACGGTGTCGCCATCTGATCGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTCGAGGTTTGCAGCTATTTTATTTACAAGTATACATTTAACACAATGAAATAAACACTGATATACTGAAGCCTAGTTAATAGTAGTGTAACAATATGCATCATTTTGATGATTACATTATTTTAAACAACAAACTACACTGAAAAATTAATGCCGATAAAATTCTTGGTCATAATATTAAGAAATACAATATATAAATTGAAAATATGATTGCTTAAAATTTGAAAATGGAAGTGAACTCATTTGGACAGACTCAGAGTTAACATAATCTGAAGGGAGGGGAGCTCTGACCCAAATGATATCTTTCAGGTTAACAGAAGAAAAAAGAAGCATAGTTTATCTTCAAGGAGAACGGGCAGTTTGCTTCTTCAGGTA fwd pet047-9952 TATTGGGCACGGTGTC TATTGGGCACGGTGTCGCCATCTGATCG
m64071_220512_054244/52233509/ccs AGCTTTTTTGGAATCTTCTGCTAAAGAAAATCAGACTGCTGTGGATGTTTTTCGAAGGATAATTTTGGAGGCAGAAAAAATGGACGGGGCAGCTTCACAAGGCAAGTCTTCATGCTCGGTGATGTGATTCTGCTGCAAAGCCTGAGGACACTGGGAATATATTCTACCTGAAGAAGCAAACTGCCCGTTCTCCTTGAAGATAAACTATGCTTCTTTTTTCTTCTGTTAACCTGAAAGATATCATTTGGGTCAGAGCTCCCCTCCCTTCAGATTATGTTAACTCTGAGTCTGTCCAAATGAGTTCACTTCCATTTTCAAATTTTAAGCAATCATATTTTCAATTTATATATTGTATTTCTTAATATTATGACCAAGAATTTTATCGGCATTAATTTTTCAGTGTAGTTTGTTGTTTAAAATAATGTAATCATCAAAATGATGCATATTGTTACACTACTATTAACTAGGCTTCAGTATATCAGTGTTTATTTCATTGTGTTAAATGTATACTTGTAAATAAAATAGCTGCAAACCTCGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACGATCAGATGGCGACACCGTGCCCAATAAGATCGGAAGAGCGTCGTGTAGA rev pet047-9952 GACACCGTGCCCAATA CGATCAGATGGCGACACCGTGCCCAATA
m64071_220512_054244/91226755/ccs TCTACACGACGCTCTTCCGATCTTATTGGGCACGGTGTCGCCATCTGATCGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTCGAGGTTTGCAGCTATTTTATTTACAAGTATACATTTAACACAATGAAATAAACACTGATATACTGAAGCCTAGTTAATAGTAGTGTAACAATATGCATCATTTTGATGATTACATTATTTTAAACAACAAACTACACTGAAAAATTAATGCCGATAAAATTCTTGGTCATAATATTAAGAAATACAATATATAAATTGAAAATATGATTGCTTAAAATTTGAAAATGGAAGTGAACTCATTTGGACAGACTCAGAGTTAACATAATCTGAAGGGAGGGGAGCTCTGACCCAAATGATATCTTTCAGGTTAACAGAAGAAAAAAGAAGCATAGTTTATCTTCAAGGAGAACGGGCAGTTTGCTTCTTCAGGTAGAATATATTCCCAGTGTCCTCAGGCTTTGCAGCAGAATCACATCACCGAGCATGAAGACTTGCCTTGTGAAGCTGCCCCGTCCATTTTTTCTGCCTCCAA fwd pet047-9952 TATTGGGCACGGTGTC TATTGGGCACGGTGTCGCCATCTGATCG
But I do not get what I want.
Solution
You can use the below script to achieve your result:
awk '{
idx = index($2, $5);
if (idx != 0) {
if ($3 == "rev") {
substr_28 = substr($2, idx - 12, 28);
} else {
substr_28 = substr($2, idx, 28);
}
print $0, substr_28;
}
}' your_file_containing_inp
Basically, the script:
first finds the index of the
$5
value in the second field. If the value is found(idx != 0)
.a. it checks the value of the third field.
b. If it's"rev"
, it extracts the28-character
substring starting 12 characters before the index.
c. else, it extracts the substring starting at the index.
Answered By - mandy8055 Answer Checked By - Gilberto Lyons (WPSolving Admin)