Thursday, December 9, 2021

[SOLVED] Simple way to remove multi-line string using sed

December 09, 2021 replace, sed, text

Issue

Using sed, is there a way to remove multiple lines from a text file based on some starting and ending expressions?

I have known markers in the file and want to remove everything between (markers inclusive). I have seen some really complicated solutions and I would like to do this without resorting to micro commands.

My file looks something like this:

cat /tmp/foobar.txt
this is line 1

this is line 3

tomcat.util.scan.StandardJarScanFilter.jarsToSkip=\
annotations-api.jar,\
ant-junit*.jar,\
ant-launcher.jar,\
ant.jar,\
asm-*.jar,\
aspectj*.jar,\
bootstrap.jar,\
catalina-ant.jar,\
catalina-ha.jar,\
catalina-ssi.jar,\
catalina-storeconfig.jar

the end leave me
and me

I want to remove everything starting at tomcat.util all the away to the last .jar

Solution

tldr;

I think this is the simplest way, ad no need for the assembly like micro commands

sed '/^tomcat\.util.*$/,/^.*[^\]$/d' /tmp/foobar.txt

which produces

this is line 1

this is line 3


the end leave me
and me

if you wanted to remove the lines in the file rather than spit out the output to stdout then use the inline flag, so

sed -i '/^tomcat\.util.*$/,/^.*[^\]$/d' /tmp/foobar.txt

So... how does this work?

sed commands, like vi commands operate on an address. Normally we don't specify an address and that simply applies the command to all lines of the file, eg when replacing the for that in a file we'd normally do

sed -i 's/the/that/g' /tmp/foobar.txt

ie applying the substitute or s command to all lines in the file.

In this case you want to delete some lines so we can use the delete or d command. But we need to tell it where to delete. So we need to give it an address.

The format of a sed command is

[addr][!]command[options]

(see the docs )

If no address is specified then the command is applied to all lines, if the ! is specified then it is applied to all lines that don't match the pattern. So far so good.

The trick here is that addr can be a single address or a range of addresses. The address can be a line number or a regex pattern. You use a , between two addresses to to specify a range.

so to delete line 5 to 8 inclusive you could do

sed -i '5,8d' /tmp/foobar.txt

in this case rather than knowing the line number we know some "markers" and we can use Regex instead, so the first marker, a line starting with tomcat.util is found by the regex

/^tomcat\.util.*$/

The second marker is a bit more tricky but if we look we can see that the final line to remove is the first one that does not end with a \, so we can match a line that consists of "anything but does not end with \"

/^.*[^\]$/

While the second marker could match a whole bunch of lines if we make a range out of these two regexes, the range means that the second "address" is the first line after the first address that matches the regex.

Putting that all together, we want to delete (d) all lines in the range from the address that is found by the regex matching a line starting with tomcat.util and ending with a line that does not end in \ ie

sed '/^tomcat\.util.*$/,/^.*[^\]$/d' /tmp/foobar.txt

hope that helps ;-)

Cheers

Karl

Answered By - Karl

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, December 9, 2021

[SOLVED] Simple way to remove multi-line string using sed

Issue

Solution

Popular Posts

Labels