Issue
I have several files containing lines with unique substring NAME-
:
<input type="hidden" name="NAME-00B5JZ" value="350.378,00">
<input type="hidden" name="NAME-0599" value="0,00">
<input type="hidden" name="NAME-7012" value="0,00">
<input type="hidden" name="NAME-0096" value="0,00">
<input type="hidden" name="NAME-0433" value="0,00">
<input type="hidden" name="NAME-1100" value="0,00">
name
and value
html tag props are always different.
I need to get tab-separated values into separate files with names corresponding to original ones.
00B5JZ 350378,00
0599 0,00
0096 0,00
0433 0,00
1100 0,00
Dots should be removed from the value
's value
EDIT: I've decided to edit this post and give another aproach for whoever reads this:
let's say files are file1.txt
, file2.txt
, file3.txt
with nothing else in the current directory:
for f in file*txt; do cat ${f} | sed 's/^[[:space:]]*//;s/<input.*name="NAME-//;s/" value="/\t/;s/">//;s/\.//g' > ${f//\.txt/_out\.txt}; done
- first we get all filenames
cat
them one by one and pass contents tosed
- remove all whitespaces in the beginning of the line
- remove everything up to the
name
's value - replace everything between
name
's value andvalue
's value with tab character - remove everything after
value
's value - save result to a new file adding
_out
suffix to original filename right beforetxt
file extension
Solution
Use sed:
sed -e 's/.*NAME-\([^"]*\)" value="\([^"]*\)".*/\1\t\2/' -e 's/\.//g' INPUT.HTML
.*
any character zero or more times[^"]*
any character but"
repeated 0 or more times\(...\)
captures the enclosed part, here the above substring up to the double quote is remembered in\1
and the value is remembered in\2
s/PATTERN/REPLACEMENT/
substitutes the pattern with the replacement; here, it extracts the part after NAME- and the value and replaces the whole line with just the two captured parts separated by a tab (\t
)s/\.//g
deletes all dots (the/g
means "global", i.e. all of them)
Answered By - choroba