Issue
I have a fasta file that looks like below:
>sequence_1_g1
ATTTCGGATAA
>sequence_2_g1
AGGCTCTAGGA
>sequence_2_g2
TGTTCTGAAAT
>sequence_2_g3
CACCTCGGAGT
>sequence_3_new_g1
GCGGATAAAGC
I'd like to only extract the numbers that comes after the last delimiter and attach them to the end of each header, so that the output would look like below:
>sequence_1_g1_1
ATTTCGGATAA
>sequence_2_g1_1
AGGCTCTAGGA
>sequence_2_g2_2
TGTTCTGAAAT
>sequence_2_g3_3
CACCTCGGAGT
>sequence_3_new_g1_1
GCGGATAAAGC
I've never used linux before and so far I've only been able to find this command to separate the text that comes after the last delimiter: sed -E 's/.*_//' filename.fasta
. Can anyone give suggestions on what commands I should look for in addition to get my desired output?
Solution
Using sed
$ sed -E 's/.*_.([0-9]+)/&_\1/' input_file
>sequence_1_g1_1
ATTTCGGATAA
>sequence_2_g1_1
AGGCTCTAGGA
>sequence_2_g2_2
TGTTCTGAAAT
>sequence_2_g3_3
CACCTCGGAGT
>sequence_3_new_g1_1
GCGGATAAAGC
Answered By - HatLess Answer Checked By - Pedro (WPSolving Volunteer)