Tuesday, October 25, 2022

[SOLVED] Replace string among two lines with sed command

Issue

I have an xml file in which I want to execute a sed command to remove some strings,

I have an portion of the file here:

<?xml version="1.0" ?>
<DataPDU
    xmlns:ns2="urn:swift:saa:xsd:saa.2.0">
    <DbtrAcct>
        <Id>
            <Othr>
                <Id>1234567890</Id>
            </Othr>
        </Id>
    </DbtrAcct>
    <CdtrAcct>
        <Id>
            <Othr>
                <Id>1000002233250</Id>
            </Othr>
        </Id>
    </CdtrAcct>
    <Dt>
        <Dt>2022-10-05</Dt>
    </Dt>
</DataPDU>

from this file what I need to do is to remove the tags <Id> and <Dt>, but only when they have the same tag inside of them, when that happens, I need to remove one of the tags. to get a file that looks like this:

<?xml version="1.0" ?>
<DataPDU
    xmlns:ns2="urn:swift:saa:xsd:saa.2.0">
    <DbtrAcct>
            <Othr>
                <Id>1234567890</Id>
            </Othr>
    </DbtrAcct>
    <CdtrAcct>
            <Othr>
                <Id>1000002233250</Id>
            </Othr>
    </CdtrAcct>
    <Dt>2022-10-05</Dt>
</DataPDU>

(here a side by side to make it better to read:)

enter image description here

for this I was trying to do with a command like the following (I'll focus just on the <Id> for now)

sed -i "s/<DbtrAcct>[^<>]*<Id>/<Id>/g" file.xml 

with this I was trying to replace the string formed by <DbtrAcct> + <Id> and just replace it with <Id> but I'm having problems trying to look for these since they're not in the same line (as far as I know sed only reads one line at the time).

how can I do to achieve what I'm needing here, I don't really know much about this type of manipulation, but I think this might work for what I need.

(the my second part would be how to escape the "/" in the closing tags for when I replace the closing tags of the file)

I'm also open to other options such as awk even with echos if it worth,

I've been trying to make the whole file a single line, then removing, and then making it xml formatted, but no luck


Solution

This might work for you (GNU sed):

sed -E '/^\s*<(Id|Dt)>/{:a;N;/^(\s*<)(\S+>).*\n\1\/\2/!ba;s/^\s*(<\S+>)[^\n]*\n(.*\1.*)\n.*/\2/}' file

If a line starts with <Id> or <Dt>, gather up the following line until its end tag at the same indentation.

If the collection contains another tag of the same type, remove the start and end lines of the collection.



Answered By - potong
Answer Checked By - David Marino (WPSolving Volunteer)