Issue
I have a file which contains 7000 molecules, and their name and energies. Each molecule starts with keyword MODEL 1, second line has energy (-9.102 in below example, first molecule) and 7th line has name of the molecule (S3670 Cefsulodin (sodium).cdx in below example, first molecule). I want to rank/sort all molecules according to their energies such that lowest (most negative) will be first molecule in a resulting text file along with molecule's name. Energy and names could be on same or different lines. I thought to use the grep for parsing but have no experience about sorting according to a value embedded in a sentences. Can somebody please help. Thank you.
MODEL 1
REMARK VINA RESULT: -9.102 0.000 0.000
REMARK INTER + INTRA: -13.194
REMARK INTER: -12.767
REMARK INTRA: -0.427
REMARK UNBOUND: 0.165
REMARK Name = S3670 Cefsulodin (sodium).cdx
REMARK 8 active torsions:
REMARK status: ('A' for Active; 'I' for Inactive)
REMARK 1 A between atoms: CA_3 and C_8
REMARK 2 A between atoms: CA_5 and N_10
REMARK 3 A between atoms: C_7 and C_12
REMARK 4 A between atoms: C_12 and N_16
REMARK 5 A between atoms: C_15 and C_17
REMARK 6 A between atoms: C_17 and C_21
REMARK 7 A between atoms: C_17 and S_22
REMARK 8 A between atoms: C_30 and C_33
REMARK x y z vdW Elec q Type
REMARK _______ _______ _______ _____ _____ ______ ____
ROOT
ATOM 1 N UNL 1 92.970 106.706 73.996 0.00 0.00 +0.000 N
ATOM 2 C UNL 1 93.751 107.062 75.160 0.00 0.00 +0.000 C
MODEL 1
REMARK VINA RESULT: -6.812 0.000 0.000
REMARK INTER + INTRA: -12.561
REMARK INTER: -11.387
REMARK INTRA: -1.175
REMARK UNBOUND: -1.767
REMARK Name = S3836 6-Gingerol.cdx
REMARK 10 active torsions:
REMARK status: ('A' for Active; 'I' for Inactive)
REMARK 1 A between atoms: C_1 and C_2
REMARK 2 A between atoms: C_1 and C_12
REMARK 3 A between atoms: C_2 and C_3
REMARK 4 A between atoms: C_3 and C_4
REMARK 5 A between atoms: C_4 and C_5
REMARK 6 A between atoms: C_5 and C_6
REMARK 7 A between atoms: C_6 and C_7
REMARK 8 A between atoms: C_7 and C_8
REMARK 9 A between atoms: C_8 and C_9
REMARK 10 A between atoms: C_14 and O_18
REMARK x y z vdW Elec q Type
REMARK _______ _______ _______ _____ _____ ______ ____
ROOT
ATOM 1 C UNL 1 89.880 102.122 75.634 0.00 0.00 +0.000 C
ENDROOT
Solution
This task can unlikely be completed by a simple command.
An option is retrieving the names and values from original file file first, then combining them and sort the output.
Here is a demonstration using rq
(https://github.com/fuyuncat/rquery/releases) to do this.
[ rquery]$ ./rq -i ';' -q "s @row, trim(substr(@raw,strlen('REMARK Name = '))) | f @raw like 'REMARK Name*'" samples/biomolecules.txt > /tmp/names.tmp
[ rquery]$ ./rq -i ';' -q "s @row, @4| f @raw like 'REMARK VINA RESULT:*'" samples/biomolecules.txt > /tmp/values.tmp
[ rquery]$ cat /tmp/names.tmp
1;S3670 Cefsulodin (sodium).cdx
2;S3836 6-Gingerol.cdx
[ rquery]$ cat /tmp/values.tmp
1;-9.102
2;-6.812
[ rquery]$ ./rq -q "p d/;/ | m @1,@2 where @fileid=1 | s @r[1][2], @2 | f @fileid=2 and @r[1][1]=@1 | o tofloat(@2)" /tmp/names.tmp /tmp/values.tmp
S3670 Cefsulodin (sodium).cdx -9.102
S3836 6-Gingerol.cdx -6.812
Answered By - WeDBA Answer Checked By - Clifford M. (WPSolving Volunteer)