Issue
Say I have hundreds of *.xml
in /train/xml/
, in the following format
# this is the content of /train/xml/RIGHT_NAME.xml
<annotation>
<path>/train/img/WRONG_NAME.jpg</path> # this is the WRONG_NAME
</annotation>
The file name WRONG_NAME in <path>...</path>
should match that of the .xml
file, so that it looks like this:
# this is the content of /train/xml/RIGHT_NAME.xml
<annotation>
<path>/train/img/RIGHT_NAME.jpg</path> # this is the **RIGHT_NAME**
</annotation>
One solution I can think of is to:
1. export all file names into a text file:
ls -1 *.xml > filenames.txt
which generates a file with the content:
RIGHT_NAME_0.xml
RIGHT_NAME_1.xml
...
2. then edit filenames.txt
, so that it becomes:
# tab at beginning of each line
<path>/train/img/RIGHT_NAME_0.jpg</path>
<path>/train/img/RIGHT_NAME_1.jpg</path>
...
3. Then, replace the third line of n
th .xml
file with the n
th line from filenames.txt
.
Thus the question title.
I've hammered around with sed
and awk
but had no success. How should I do it (on a EDIT: MacOS machine)? Also, is there a more elegant solution?
Thanks in advance for helping out!
---things I've tried (and didnt work out)---
# this replaces the fifth line with an empty string
for i in *.xml ; do perl -i.bak -pe 's/.*/$i/ if $.==5' RIGHT_NAME.xml ; done
# this apprehends contents of filenames.txt after third line
sed -i.bak -e '/\<path\>/r filenames.txt' RIGHT_NAME.xml
# also, trying to utilize the <path>...</path> pattern...
Solution
Untested:
for xml in *.xml; do
sed -E -i.bak '3s/([^/]*.jpg)/'"${xml/.xml/.jpg}/" "$xml"
done
Answered By - rici