Issue
I need to delete any paragraphs/blocks of text that begin with a time in HH:mm format and end with a specific string from a large number of files in a folder. Each paragraph that needs deleting ends with the string #file. Between each paragraph is a blank line. Is it possible to delete everything between these two? Sample file as follows:
00:00
-Paragraph one. Can be multiple
lines. Paragraph one. Don't delete
this paragraph. #No
19:30
-Paragraph two.
-Can be multiple lines.
-Delete this paragraph. #file
13.30
-Paragraph three. Delete this. #file
So ideally what would be left is:
00:00
-Paragraph one. Can be multiple
lines. Paragraph one. Don't delete
this paragraph. #No
The paragraphs won't ever be the first paragraph of the document, but they could be the last.
I'm no expert so I've been trying things I found online with no luck. Thanks for any help you can give me!
--
Edit:
Thanks everyone for all the help. I ended up with this as it was closer to what I tried already, worked perfectly for anyone looking for something similar:
gawk -i inplace 'BEGIN{RS=ORS="\n\n"}!/#file/' *.md
Solution
I would harness GNU AWK
for this task following way, let file.txt
content be
00:00
-Paragraph one. Can be multiple
lines. Paragraph one. Don't delete
this paragraph. #No
19:30
-Paragraph two.
-Can be multiple lines.
-Delete this paragraph. #file
13.30
-Paragraph three. Delete this. #file
then
awk 'BEGIN{RS=""}!/#file/' file.txt
gives output
00:00
-Paragraph one. Can be multiple
lines. Paragraph one. Don't delete
this paragraph. #No
Explanation: I set RS
(row separator) to empty string which triggers paragraph mode, so rows are separatored by one or more blank lines, then I select rows which do not (!
) contains #file
. If there is more than one item to keep there will be no blank line between them, if this is desired replace RS=""
using RS=ORS="\n\n"
.
(tested in GNU Awk 5.0.1)
Answered By - Daweo Answer Checked By - Robin (WPSolving Admin)