Thursday, July 28, 2022

[SOLVED] Count specific character per every species of a Fasta file

Issue

I have been trying to find the amount of 1s per each species in a fasta file that looks like this:

>111
1100101010
>102
1110000001

The desired output would be:

>111
5
>102
4

I know how to get the numbers of 1s in a file with:

grep -c 1 file

My problem is that I cannot find the way to keep track of the number of 1s per each species (instead of the total in the file).


Solution

>111
11001010101110000001

can also be written as

>111
1100101010
1110000001

but none of the existing solutions work for the latter. This addresses that oversight:

perl -Mv5.10 -ne'
   if ( /^>/ ) {
      say $c if defined $c;
      $c = 0;
      print;
   } else {
      $c += tr/1//;
   }
   END {
      say $c if defined $c;
   }
' file.fasta

For both files show above, the program outputs

>111
9


Answered By - ikegami
Answer Checked By - Cary Denson (WPSolving Admin)