Issue
I'm having a .txt file looking like this (along about 400 rows):
lettuceFMnode_1240 J_C7R5_99354_KNKSR3_Oligomycin 81.52
lettuceFMnode_3755 H_C1R3_99940_KNKSF2_Tubulysin 70
lettuceFMnode_17813 G_C4R5_80184_KNKS113774F_Tetronasin 79.57
lettuceFMnode_69469 J_C11R7_99276_KNKSF2_Nystatin 87.27
I want to edit the names in the entire 2nd column so that only the last part will stay (meaning delete anything before that, so in fact leaving what comes after the last _).
I looked into different solutions using a combination of cut
and sed
, but couldn't understand how the code should be built.
Would appreciate any tips and help!
Thank you!
Solution
Here's one way:
perl -pe 's/^\S+\s+\K\S+_//'
For every line of input (-p
) we execute some code (-e ...
).
The code performs a subtitution (s/PATTERN/REPLACEMENT/
).
The pattern matches as follows:
^
beginning of string\S+
1 or more non-whitespace characters (the first column)\s+
1 or more whitespace characters (the space after the first column)\K
do not treat the text matched so far as part of the final match\S+
1 or more non-whitespace characters (the second column)_
an underscore
Because +
is greedy (it matches as many characters as possible), \S+_
will match everything up to the last _
in the second column.
Because we used \K
, only the rest of the pattern (i.e. the part of the match that lies in the second column) gets replaced.
The replacement string is empty, so the match is effectively removed.
Answered By - melpomene