Issue
Recently, I have a list of values stored in a file called Name.txt
XP_037759835.2
XP_037759838.2
XP_037759836.2
This file is used for finding several sequences in another file called sequence.faa
>NP_001277599.1 actin, alpha cardiac muscle 1 [Chelonia mydas]
MCDDEETTALVCDNGSGLVKAGFAGDDAPRAVFPSIVGRPRHQGVMVGMGQKDSYVGDEAQSKRGILTLKYPIEHGIITN
WDDMEKIWHHTFYNELRVAPEEHPTLLTEAPLNPKANREKMTQIMFETFNVPAMYVAIQAVLSLYASGRTTGIVLDSGDG
VTHNVPIYEGYALPHAIMRLDLAGRDLTDYLMKILTERGYSFVTTAEREIVRDIKEKLCYVALDFENEMATAASSSSLEK
SYELPDGQVITIGNERFRCPETLFQPSFIGMESAGIHETTYNSIMKCDIDIRKDLYANNVLSGGTTMYPGIADRMQKEIT
ALAPSTMKIKIIAPPERKYSVWIGGSILASLSTFQQMWISKQEYDEAGPSIVHRKCF
>NP_001277600.1 cytochrome P450 1A [Chelonia mydas]
MSLLGSQGIISVTEILIASAVFCLTFMVIRSFRQQIPKGLKRLPGPRGYPLIGNLLELGSNPHLTLTQMSQKYGDVMQIR
IGTRPVLVLSGLDTIKQALVKQGEDFMGRPDLYSFHHVADGQSLTFSTDSGEVWRARRKLAQNALKTFSVSPSPNSSSTC
LLEEHVSKEADYLVRKLLQLMEEKKRFDPFRYVVVSVANVICAMCFGNRYDHDDQELLSIVNVTEEFGDVAASGNPVDFI
PVLQYLPNRTMKKFMEFNTRFLRLLQDIVKEHYESFEKDNIRDITDSLIEQSQENKVEANANIQLPKGKIINLVNDLFGA
GFDTVTTALSWSLMYLVTYPDIQKKIQEELDQTIGRERRPRLSDRPMLPYTEAFILEMFRHSSFLPFTIPHCTTKDTVLN
GYYIPKDLCVFVNQWQVNHDEKLWKEPSRFDPERFLRAGGTEVNKTDGEKILIFGLGKRKCLGETIARWEVFLFLTTLLQ
QLEFSISDGQKVDMTPLYGLTMKHKRCEHFQVKQRFPIQSSE
>NP_001277601.1 tumor susceptibility gene 101 protein [Chelonia mydas]
MAVRESELKKMLAKYKYRDLTVQETTSVITQYKDLKPVMDAYVFNDGSSRDLMSLTGTIPVPYRGNTYNIPICLWLLDTY
PFNPPICFVKPTSSMTIKTGKHVDANGKIYLPYLHEWKHPQSDLIGLIQIMIVVFGEEPPVFSRPTISTSFQPYQATGPP
NTSYMPGMPSGISPYPPGHPPNPSGYPGYPYPPGGPFPATTSGQHYTSQPPVTTVGPSRDGTISEDTIRASRISAVSDKL
RWRMKEEMDRAQAELNALKRTEEDLKKGHQKLEEMVTRLDHEVAEVDKNIELLKKKDEELSSALEKMENQSENNDIDEVI
IPTAPLYKQILNLYAEENAIEDTIFYLGEALRRGVIDLDVFLKHVRLLSRKQFQLRALMQKARKTAGLSDLY
>XP_037759835.2 splicing factor 1 isoform X1 [Chelonia mydas]
MAATGANATPLGKLHPPPPPGKPGYPMPPPGPPGLVLPGPPPPPPPGPGQAQAALLGPMAAAAYPFAALPPPPPPPPPPP
PPPQPQPPPQQPQPPPPPPPPPPPQQQQPPPQAGGPQPPPQYGQYRYPSPPPPPQGHEQQQPPPPQQQQQDESGPGGGSN
HDFPNKKRKRSRWNQDTMEQKTVIPGMPTVIPPGLTREQERAYIVQLQIEDLTRKLRTGDLGIPPNPEDRSPSPEPIYNS
EGKRLNTREFRTRKKLEEERHNLITEMVALNPDFKPPADYKPPATRVSDKVMIPQDEYPEINFVGLLIGPRGNTLKNIEK
ECNAKIMIRGKGSVKEGKVGRKDGQMLPGEDEPLHALVTANTMENVKKAVEQIRNILKQGIETPEDQNDLRKMQLRELAR
LNGTLREDDNRILRPWQSAETRSITNTTVCTKCGGAGHIASDCKFSRPGDPQSAQDKARMDKEYLSLMAELGEAPVPASV
GSSSGPTNTPLSSGPRPSGPGNNPPPPNRPPWMNSGPSDNRPYHGMHGGPGGPGGPHNFHHPMPNMGGHGGHPMQHNPNG
PPPWMQPHHPPMNQGPHPPGHPGPHHMDQYLGNTPVGSGVYRLHQGKGMMPPPMGMMAPPPPPPSGQPPPPPSGPLPPWQ
QQQQPPPPPPSSSMASSTPLPWQQNTTTTTTTSAGTGSIPPWQQQGAAVAASTGAPQMQGNPSMVPLPPGVQPPLPPGAP
PPPPPPPPGSAGMMYAPPPPPPPMDPSNFVTMMGMGVPALPPFGMPPAPPPPPPQN
>XP_037759838.2 splicing factor 1 isoform X4 [Chelonia mydas]
MAATGANATPLGKLHPPPPPGKPGYPMPPPGPPGLVLPGPPPPPPPGPGQAQAALLGPMAAAAYPFAALPPPPPPPPPPP
PPPQPQPPPQQPQPPPPPPPPPPPQQQQPPPQAGGPQPPPQYGQYRYPSPPPPPQGHEQQQPPPPQQQQQDESGPGGGSN
HDFPNKKRKRSRWNQDTMEQKTVIPGMPTVIPPGLTREQERAYIVQLQIEDLTRKLRTGDLGIPPNPEDRSPSPEPIYNS
EGKRLNTREFRTRKKLEEERHNLITEMVALNPDFKPPADYKPPATRVSDKVMIPQDEYPEINFVGLLIGPRGNTLKNIEK
ECNAKIMIRGKGSVKEGKVGRKDGQMLPGEDEPLHALVTANTMENVKKAVEQIRNILKQGIETPEDQNDLRKMQLRELAR
LNGTLREDDNRILRPWQSAETRSITNTTVCTKCGGAGHIASDCKFSRPGDPQSAQDKARMDKEYLSLMAELGEAPVPASV
GSSSGPTNTPLSSGPRPSGPGNNPPPPNRPPWMNSGPSDNRPYHGMHGGPGGPGGPHNFHHPMPNMGGHGGHPMQHNPNG
PPPWMQPHHPPMNQGPHPPGHPGPHHMDQYLGNTPVGSGVYRLHQGKDTTTTTTTSAGTGSIPPWQQQGAAVAASTGAPQ
MQGNPSMVPLPPGVQPPLPPGAPPPPPPPPPGSAGMMYAPPPPPPPMDPSNFVTMMGMGVPALPPFGMPPAPPPPPPQN
>XP_037759836.2 splicing factor 1 isoform X2 [Chelonia mydas]
MAATGANATPLGKLHPPPPPGKPGYPMPPPGPPGLVLPGPPPPPPPGPGQAQAALLGPMAAAAYPFAALPPPPPPPPPPP
PPPQPQPPPQQPQPPPPPPPPPPPQQQQPPPQAGGPQPPPQYGQYRYPSPPPPPQGHEQQQPPPPQQQQQDESGPGGGSN
HDFPNKKRKRSRWNQDTMEQKTVIPGMPTVIPPGLTREQERAYIVQLQIEDLTRKLRTGDLGIPPNPEDRSPSPEPIYNS
EGKRLNTREFRTRKKLEEERHNLITEMVALNPDFKPPADYKPPATRVSDKVMIPQDEYPEINFVGLLIGPRGNTLKNIEK
ECNAKIMIRGKGSVKEGKVGRKDGQMLPGEDEPLHALVTANTMENVKKAVEQIRNILKQGIETPEDQNDLRKMQLRELAR
LNGTLREDDNRILRPWQSAETRSITNTTVCTKCGGAGHIASDCKFSRPGDPQSAQDKARMDKEYLSLMAELGEAPVPASV
GSSSGPTNTPLSSGPRPSGPGNNPPPPNRPPWMNSGPSDNRPYHGMHGGPGGPGGPHNFHHPMPNMGGHGGHPMQHNPNG
PPPWMQPHHPPMNQGPHPPGHPGPHHMDQYLGNTPVGSGVYRLHQGKGMMPPPMGMMAPPPPPPSGQPPPPPSGPLPPWQ
QQQQPPPPPPSSSMASSTPLPWQQSEYDDHHHHERWHRVHPAMAAAGGCGGGFYGGPADARQPLHGPFASRGPASAAARG
PAAAAAAAAWLRGHDVRPAPSPAPHGPF
The desire output is the names XP_037759835.2, XP_037759838.2, XP_037759836.2 will be matched and their sequence below will be extracted out
>XP_037759835.2 splicing factor 1 isoform X1 [Chelonia mydas]
MAATGANATPLGKLHPPPPPGKPGYPMPPPGPPGLVLPGPPPPPPPGPGQAQAALLGPMAAAAYPFAALPPPPPPPPPPP
PPPQPQPPPQQPQPPPPPPPPPPPQQQQPPPQAGGPQPPPQYGQYRYPSPPPPPQGHEQQQPPPPQQQQQDESGPGGGSN
HDFPNKKRKRSRWNQDTMEQKTVIPGMPTVIPPGLTREQERAYIVQLQIEDLTRKLRTGDLGIPPNPEDRSPSPEPIYNS
EGKRLNTREFRTRKKLEEERHNLITEMVALNPDFKPPADYKPPATRVSDKVMIPQDEYPEINFVGLLIGPRGNTLKNIEK
ECNAKIMIRGKGSVKEGKVGRKDGQMLPGEDEPLHALVTANTMENVKKAVEQIRNILKQGIETPEDQNDLRKMQLRELAR
LNGTLREDDNRILRPWQSAETRSITNTTVCTKCGGAGHIASDCKFSRPGDPQSAQDKARMDKEYLSLMAELGEAPVPASV
GSSSGPTNTPLSSGPRPSGPGNNPPPPNRPPWMNSGPSDNRPYHGMHGGPGGPGGPHNFHHPMPNMGGHGGHPMQHNPNG
PPPWMQPHHPPMNQGPHPPGHPGPHHMDQYLGNTPVGSGVYRLHQGKGMMPPPMGMMAPPPPPPSGQPPPPPSGPLPPWQ
QQQQPPPPPPSSSMASSTPLPWQQNTTTTTTTSAGTGSIPPWQQQGAAVAASTGAPQMQGNPSMVPLPPGVQPPLPPGAP
PPPPPPPPGSAGMMYAPPPPPPPMDPSNFVTMMGMGVPALPPFGMPPAPPPPPPQN
>XP_037759838.2 splicing factor 1 isoform X4 [Chelonia mydas]
MAATGANATPLGKLHPPPPPGKPGYPMPPPGPPGLVLPGPPPPPPPGPGQAQAALLGPMAAAAYPFAALPPPPPPPPPPP
PPPQPQPPPQQPQPPPPPPPPPPPQQQQPPPQAGGPQPPPQYGQYRYPSPPPPPQGHEQQQPPPPQQQQQDESGPGGGSN
HDFPNKKRKRSRWNQDTMEQKTVIPGMPTVIPPGLTREQERAYIVQLQIEDLTRKLRTGDLGIPPNPEDRSPSPEPIYNS
EGKRLNTREFRTRKKLEEERHNLITEMVALNPDFKPPADYKPPATRVSDKVMIPQDEYPEINFVGLLIGPRGNTLKNIEK
ECNAKIMIRGKGSVKEGKVGRKDGQMLPGEDEPLHALVTANTMENVKKAVEQIRNILKQGIETPEDQNDLRKMQLRELAR
LNGTLREDDNRILRPWQSAETRSITNTTVCTKCGGAGHIASDCKFSRPGDPQSAQDKARMDKEYLSLMAELGEAPVPASV
GSSSGPTNTPLSSGPRPSGPGNNPPPPNRPPWMNSGPSDNRPYHGMHGGPGGPGGPHNFHHPMPNMGGHGGHPMQHNPNG
PPPWMQPHHPPMNQGPHPPGHPGPHHMDQYLGNTPVGSGVYRLHQGKDTTTTTTTSAGTGSIPPWQQQGAAVAASTGAPQ
MQGNPSMVPLPPGVQPPLPPGAPPPPPPPPPGSAGMMYAPPPPPPPMDPSNFVTMMGMGVPALPPFGMPPAPPPPPPQN
>XP_037759836.2 splicing factor 1 isoform X2 [Chelonia mydas]
MAATGANATPLGKLHPPPPPGKPGYPMPPPGPPGLVLPGPPPPPPPGPGQAQAALLGPMAAAAYPFAALPPPPPPPPPPP
PPPQPQPPPQQPQPPPPPPPPPPPQQQQPPPQAGGPQPPPQYGQYRYPSPPPPPQGHEQQQPPPPQQQQQDESGPGGGSN
HDFPNKKRKRSRWNQDTMEQKTVIPGMPTVIPPGLTREQERAYIVQLQIEDLTRKLRTGDLGIPPNPEDRSPSPEPIYNS
EGKRLNTREFRTRKKLEEERHNLITEMVALNPDFKPPADYKPPATRVSDKVMIPQDEYPEINFVGLLIGPRGNTLKNIEK
ECNAKIMIRGKGSVKEGKVGRKDGQMLPGEDEPLHALVTANTMENVKKAVEQIRNILKQGIETPEDQNDLRKMQLRELAR
LNGTLREDDNRILRPWQSAETRSITNTTVCTKCGGAGHIASDCKFSRPGDPQSAQDKARMDKEYLSLMAELGEAPVPASV
GSSSGPTNTPLSSGPRPSGPGNNPPPPNRPPWMNSGPSDNRPYHGMHGGPGGPGGPHNFHHPMPNMGGHGGHPMQHNPNG
PPPWMQPHHPPMNQGPHPPGHPGPHHMDQYLGNTPVGSGVYRLHQGKGMMPPPMGMMAPPPPPPSGQPPPPPSGPLPPWQ
QQQQPPPPPPSSSMASSTPLPWQQSEYDDHHHHERWHRVHPAMAAAGGCGGGFYGGPADARQPLHGPFASRGPASAAARG
PAAAAAAAAWLRGHDVRPAPSPAPHGPF
I was thinking is to extract a certain amount of lines after a pattern from Name.txt has matched in sequence.faa. The problem is I'm not sure whether awk, sed or grep can accomplish such output. Can anyone give me some solutions or directions to tackle this issue? Ps: The above files are examples, not the actual file that Im been working on.
Solution
I would harness GNU AWK
for this task following way, let names.txt
content be
XP_037759835.2
XP_037759838.2
XP_037759836.2
and file.txt
content be
>NP_001277599.1 actin, alpha cardiac muscle 1 [Chelonia mydas]
MCDDEETTALVCDNGSGLVKAGFAGDDAPRAVFPSIVGRPRHQGVMVGMGQKDSYVGDEAQSKRGILTLKYPIEHGIITN
WDDMEKIWHHTFYNELRVAPEEHPTLLTEAPLNPKANREKMTQIMFETFNVPAMYVAIQAVLSLYASGRTTGIVLDSGDG
VTHNVPIYEGYALPHAIMRLDLAGRDLTDYLMKILTERGYSFVTTAEREIVRDIKEKLCYVALDFENEMATAASSSSLEK
SYELPDGQVITIGNERFRCPETLFQPSFIGMESAGIHETTYNSIMKCDIDIRKDLYANNVLSGGTTMYPGIADRMQKEIT
ALAPSTMKIKIIAPPERKYSVWIGGSILASLSTFQQMWISKQEYDEAGPSIVHRKCF
>NP_001277600.1 cytochrome P450 1A [Chelonia mydas]
MSLLGSQGIISVTEILIASAVFCLTFMVIRSFRQQIPKGLKRLPGPRGYPLIGNLLELGSNPHLTLTQMSQKYGDVMQIR
IGTRPVLVLSGLDTIKQALVKQGEDFMGRPDLYSFHHVADGQSLTFSTDSGEVWRARRKLAQNALKTFSVSPSPNSSSTC
LLEEHVSKEADYLVRKLLQLMEEKKRFDPFRYVVVSVANVICAMCFGNRYDHDDQELLSIVNVTEEFGDVAASGNPVDFI
PVLQYLPNRTMKKFMEFNTRFLRLLQDIVKEHYESFEKDNIRDITDSLIEQSQENKVEANANIQLPKGKIINLVNDLFGA
GFDTVTTALSWSLMYLVTYPDIQKKIQEELDQTIGRERRPRLSDRPMLPYTEAFILEMFRHSSFLPFTIPHCTTKDTVLN
GYYIPKDLCVFVNQWQVNHDEKLWKEPSRFDPERFLRAGGTEVNKTDGEKILIFGLGKRKCLGETIARWEVFLFLTTLLQ
QLEFSISDGQKVDMTPLYGLTMKHKRCEHFQVKQRFPIQSSE
>NP_001277601.1 tumor susceptibility gene 101 protein [Chelonia mydas]
MAVRESELKKMLAKYKYRDLTVQETTSVITQYKDLKPVMDAYVFNDGSSRDLMSLTGTIPVPYRGNTYNIPICLWLLDTY
PFNPPICFVKPTSSMTIKTGKHVDANGKIYLPYLHEWKHPQSDLIGLIQIMIVVFGEEPPVFSRPTISTSFQPYQATGPP
NTSYMPGMPSGISPYPPGHPPNPSGYPGYPYPPGGPFPATTSGQHYTSQPPVTTVGPSRDGTISEDTIRASRISAVSDKL
RWRMKEEMDRAQAELNALKRTEEDLKKGHQKLEEMVTRLDHEVAEVDKNIELLKKKDEELSSALEKMENQSENNDIDEVI
IPTAPLYKQILNLYAEENAIEDTIFYLGEALRRGVIDLDVFLKHVRLLSRKQFQLRALMQKARKTAGLSDLY
>XP_037759835.2 splicing factor 1 isoform X1 [Chelonia mydas]
MAATGANATPLGKLHPPPPPGKPGYPMPPPGPPGLVLPGPPPPPPPGPGQAQAALLGPMAAAAYPFAALPPPPPPPPPPP
PPPQPQPPPQQPQPPPPPPPPPPPQQQQPPPQAGGPQPPPQYGQYRYPSPPPPPQGHEQQQPPPPQQQQQDESGPGGGSN
HDFPNKKRKRSRWNQDTMEQKTVIPGMPTVIPPGLTREQERAYIVQLQIEDLTRKLRTGDLGIPPNPEDRSPSPEPIYNS
EGKRLNTREFRTRKKLEEERHNLITEMVALNPDFKPPADYKPPATRVSDKVMIPQDEYPEINFVGLLIGPRGNTLKNIEK
ECNAKIMIRGKGSVKEGKVGRKDGQMLPGEDEPLHALVTANTMENVKKAVEQIRNILKQGIETPEDQNDLRKMQLRELAR
LNGTLREDDNRILRPWQSAETRSITNTTVCTKCGGAGHIASDCKFSRPGDPQSAQDKARMDKEYLSLMAELGEAPVPASV
GSSSGPTNTPLSSGPRPSGPGNNPPPPNRPPWMNSGPSDNRPYHGMHGGPGGPGGPHNFHHPMPNMGGHGGHPMQHNPNG
PPPWMQPHHPPMNQGPHPPGHPGPHHMDQYLGNTPVGSGVYRLHQGKGMMPPPMGMMAPPPPPPSGQPPPPPSGPLPPWQ
QQQQPPPPPPSSSMASSTPLPWQQNTTTTTTTSAGTGSIPPWQQQGAAVAASTGAPQMQGNPSMVPLPPGVQPPLPPGAP
PPPPPPPPGSAGMMYAPPPPPPPMDPSNFVTMMGMGVPALPPFGMPPAPPPPPPQN
>XP_037759838.2 splicing factor 1 isoform X4 [Chelonia mydas]
MAATGANATPLGKLHPPPPPGKPGYPMPPPGPPGLVLPGPPPPPPPGPGQAQAALLGPMAAAAYPFAALPPPPPPPPPPP
PPPQPQPPPQQPQPPPPPPPPPPPQQQQPPPQAGGPQPPPQYGQYRYPSPPPPPQGHEQQQPPPPQQQQQDESGPGGGSN
HDFPNKKRKRSRWNQDTMEQKTVIPGMPTVIPPGLTREQERAYIVQLQIEDLTRKLRTGDLGIPPNPEDRSPSPEPIYNS
EGKRLNTREFRTRKKLEEERHNLITEMVALNPDFKPPADYKPPATRVSDKVMIPQDEYPEINFVGLLIGPRGNTLKNIEK
ECNAKIMIRGKGSVKEGKVGRKDGQMLPGEDEPLHALVTANTMENVKKAVEQIRNILKQGIETPEDQNDLRKMQLRELAR
LNGTLREDDNRILRPWQSAETRSITNTTVCTKCGGAGHIASDCKFSRPGDPQSAQDKARMDKEYLSLMAELGEAPVPASV
GSSSGPTNTPLSSGPRPSGPGNNPPPPNRPPWMNSGPSDNRPYHGMHGGPGGPGGPHNFHHPMPNMGGHGGHPMQHNPNG
PPPWMQPHHPPMNQGPHPPGHPGPHHMDQYLGNTPVGSGVYRLHQGKDTTTTTTTSAGTGSIPPWQQQGAAVAASTGAPQ
MQGNPSMVPLPPGVQPPLPPGAPPPPPPPPPGSAGMMYAPPPPPPPMDPSNFVTMMGMGVPALPPFGMPPAPPPPPPQN
>XP_037759836.2 splicing factor 1 isoform X2 [Chelonia mydas]
MAATGANATPLGKLHPPPPPGKPGYPMPPPGPPGLVLPGPPPPPPPGPGQAQAALLGPMAAAAYPFAALPPPPPPPPPPP
PPPQPQPPPQQPQPPPPPPPPPPPQQQQPPPQAGGPQPPPQYGQYRYPSPPPPPQGHEQQQPPPPQQQQQDESGPGGGSN
HDFPNKKRKRSRWNQDTMEQKTVIPGMPTVIPPGLTREQERAYIVQLQIEDLTRKLRTGDLGIPPNPEDRSPSPEPIYNS
EGKRLNTREFRTRKKLEEERHNLITEMVALNPDFKPPADYKPPATRVSDKVMIPQDEYPEINFVGLLIGPRGNTLKNIEK
ECNAKIMIRGKGSVKEGKVGRKDGQMLPGEDEPLHALVTANTMENVKKAVEQIRNILKQGIETPEDQNDLRKMQLRELAR
LNGTLREDDNRILRPWQSAETRSITNTTVCTKCGGAGHIASDCKFSRPGDPQSAQDKARMDKEYLSLMAELGEAPVPASV
GSSSGPTNTPLSSGPRPSGPGNNPPPPNRPPWMNSGPSDNRPYHGMHGGPGGPGGPHNFHHPMPNMGGHGGHPMQHNPNG
PPPWMQPHHPPMNQGPHPPGHPGPHHMDQYLGNTPVGSGVYRLHQGKGMMPPPMGMMAPPPPPPSGQPPPPPSGPLPPWQ
QQQQPPPPPPSSSMASSTPLPWQQSEYDDHHHHERWHRVHPAMAAAGGCGGGFYGGPADARQPLHGPFASRGPASAAARG
PAAAAAAAAWLRGHDVRPAPSPAPHGPF
then
awk 'BEGIN{RS=">";OFS="|";ORS=""}FNR==NR{$1=$1;regex=$0;next}$0~regex{print ">" $0}' names.txt file.txt
gives output
>XP_037759835.2 splicing factor 1 isoform X1 [Chelonia mydas]
MAATGANATPLGKLHPPPPPGKPGYPMPPPGPPGLVLPGPPPPPPPGPGQAQAALLGPMAAAAYPFAALPPPPPPPPPPP
PPPQPQPPPQQPQPPPPPPPPPPPQQQQPPPQAGGPQPPPQYGQYRYPSPPPPPQGHEQQQPPPPQQQQQDESGPGGGSN
HDFPNKKRKRSRWNQDTMEQKTVIPGMPTVIPPGLTREQERAYIVQLQIEDLTRKLRTGDLGIPPNPEDRSPSPEPIYNS
EGKRLNTREFRTRKKLEEERHNLITEMVALNPDFKPPADYKPPATRVSDKVMIPQDEYPEINFVGLLIGPRGNTLKNIEK
ECNAKIMIRGKGSVKEGKVGRKDGQMLPGEDEPLHALVTANTMENVKKAVEQIRNILKQGIETPEDQNDLRKMQLRELAR
LNGTLREDDNRILRPWQSAETRSITNTTVCTKCGGAGHIASDCKFSRPGDPQSAQDKARMDKEYLSLMAELGEAPVPASV
GSSSGPTNTPLSSGPRPSGPGNNPPPPNRPPWMNSGPSDNRPYHGMHGGPGGPGGPHNFHHPMPNMGGHGGHPMQHNPNG
PPPWMQPHHPPMNQGPHPPGHPGPHHMDQYLGNTPVGSGVYRLHQGKGMMPPPMGMMAPPPPPPSGQPPPPPSGPLPPWQ
QQQQPPPPPPSSSMASSTPLPWQQNTTTTTTTSAGTGSIPPWQQQGAAVAASTGAPQMQGNPSMVPLPPGVQPPLPPGAP
PPPPPPPPGSAGMMYAPPPPPPPMDPSNFVTMMGMGVPALPPFGMPPAPPPPPPQN
>XP_037759838.2 splicing factor 1 isoform X4 [Chelonia mydas]
MAATGANATPLGKLHPPPPPGKPGYPMPPPGPPGLVLPGPPPPPPPGPGQAQAALLGPMAAAAYPFAALPPPPPPPPPPP
PPPQPQPPPQQPQPPPPPPPPPPPQQQQPPPQAGGPQPPPQYGQYRYPSPPPPPQGHEQQQPPPPQQQQQDESGPGGGSN
HDFPNKKRKRSRWNQDTMEQKTVIPGMPTVIPPGLTREQERAYIVQLQIEDLTRKLRTGDLGIPPNPEDRSPSPEPIYNS
EGKRLNTREFRTRKKLEEERHNLITEMVALNPDFKPPADYKPPATRVSDKVMIPQDEYPEINFVGLLIGPRGNTLKNIEK
ECNAKIMIRGKGSVKEGKVGRKDGQMLPGEDEPLHALVTANTMENVKKAVEQIRNILKQGIETPEDQNDLRKMQLRELAR
LNGTLREDDNRILRPWQSAETRSITNTTVCTKCGGAGHIASDCKFSRPGDPQSAQDKARMDKEYLSLMAELGEAPVPASV
GSSSGPTNTPLSSGPRPSGPGNNPPPPNRPPWMNSGPSDNRPYHGMHGGPGGPGGPHNFHHPMPNMGGHGGHPMQHNPNG
PPPWMQPHHPPMNQGPHPPGHPGPHHMDQYLGNTPVGSGVYRLHQGKDTTTTTTTSAGTGSIPPWQQQGAAVAASTGAPQ
MQGNPSMVPLPPGVQPPLPPGAPPPPPPPPPGSAGMMYAPPPPPPPMDPSNFVTMMGMGVPALPPFGMPPAPPPPPPQN
>XP_037759836.2 splicing factor 1 isoform X2 [Chelonia mydas]
MAATGANATPLGKLHPPPPPGKPGYPMPPPGPPGLVLPGPPPPPPPGPGQAQAALLGPMAAAAYPFAALPPPPPPPPPPP
PPPQPQPPPQQPQPPPPPPPPPPPQQQQPPPQAGGPQPPPQYGQYRYPSPPPPPQGHEQQQPPPPQQQQQDESGPGGGSN
HDFPNKKRKRSRWNQDTMEQKTVIPGMPTVIPPGLTREQERAYIVQLQIEDLTRKLRTGDLGIPPNPEDRSPSPEPIYNS
EGKRLNTREFRTRKKLEEERHNLITEMVALNPDFKPPADYKPPATRVSDKVMIPQDEYPEINFVGLLIGPRGNTLKNIEK
ECNAKIMIRGKGSVKEGKVGRKDGQMLPGEDEPLHALVTANTMENVKKAVEQIRNILKQGIETPEDQNDLRKMQLRELAR
LNGTLREDDNRILRPWQSAETRSITNTTVCTKCGGAGHIASDCKFSRPGDPQSAQDKARMDKEYLSLMAELGEAPVPASV
GSSSGPTNTPLSSGPRPSGPGNNPPPPNRPPWMNSGPSDNRPYHGMHGGPGGPGGPHNFHHPMPNMGGHGGHPMQHNPNG
PPPWMQPHHPPMNQGPHPPGHPGPHHMDQYLGNTPVGSGVYRLHQGKGMMPPPMGMMAPPPPPPSGQPPPPPSGPLPPWQ
QQQQPPPPPPSSSMASSTPLPWQQSEYDDHHHHERWHRVHPAMAAAGGCGGGFYGGPADARQPLHGPFASRGPASAAARG
PAAAAAAAAWLRGHDVRPAPSPAPHGPF
Explanation: when processing 1st file (FNR==NR
) I built regex
by joining newline separated fields using pipe (|
) which means alternative (OR) and instruct GNU AWK
to go to next
line so nothing other happens when processing files, when processing following files I print
line if it does match regex
and I prepend it >
as it was consumed due to being row separator (RS
). Dislcaimer: this solution assumes that there are not spaces in first file and characters of special meaning in regular expression are not problem (e.g. here .
are treated as any character).
(tested in GNU Awk 5.0.1)
Answered By - Daweo Answer Checked By - Clifford M. (WPSolving Volunteer)