Issue
I'm having a bit of a problem with this... I'm trying to use Bash scripting (Sed, in particular) to process the following text. Other methods are welcome, of course! But I'm hoping it could be a Bash solution...
Tricky input:
("a"|"b"|"c")."A"|"B"|"C".("e"|"f")."E"|"F"
Desired output:
("a"|"b"|"c")."ABC".("e"|"f")."EF"
Mainly, I think what I want to do is replace the strings "|"
with nothing, but limit the scope of change outside of any existing text in parentheses.
The problems gets more crazy with different forms of text inputs I have with the dataset that I have. As in, the combination of blocks (delimited by .
) with parentheses and non-parenthesese is varied.
Thanks in advance.
Something I've tried with SED:
gsed -E "s/(\.\"[[:graph:]]+)\"\|\"/\1/g" input.txt
output i get is:
("a"|"b"|"c")."A"|"B"|"C".("e"|"f")."EF"
Looks like I'm only getting the partially desired output...only targeting a limited scope...
Solution
Assumptions/understandings:
- fields are separated by periods
- fields wrapped in parens are to be left alone
- all other fields have leading/trailing double quotes while all other double quotes, as well as pipes, are to be removed
Sample data:
$ cat pipes.dat
("a"|"b"|"c")."A"|"B"|"C".("e"|"f")."E"|"F"
"j"|"K"|"L"."m"|"n"|"o"|"p".("x"|"y"|"z")
One awk
idea:
awk '
BEGIN { FS=OFS="." } # define input/output field separator as a period
{ printf "############\nbefore: %s\n",$0 # print a record separator and the current input line;
# solely for display purposes; this line can
# be removed/commented-out once logic is verified
for (i=1; i<=NF; i++) # loop through fields
if ( $i !~ "^[(].*[)]$" ) # if field does not start/end with parens then ...
$i="\"" gensub(/"|\|/,"","g",$i) "\"" # replace field with a new double quote (+) modified string
# whereby all double quotes and pipes are removed (+)
# a new ending double quote
printf "after : %s\n",$0 # print the newly modified line;
# can be replaced with "print" once logic is verified
}
' pipes.dat # read data from file; to read from a variable remove this line and ...
#' <<< "${variable_name}" # uncomment this line
The above generates:
############
before: ("a"|"b"|"c")."A"|"B"|"C".("e"|"f")."E"|"F"
after : ("a"|"b"|"c")."ABC".("e"|"f")."EF"
############
before: "j"|"K"|"L"."m"|"n"|"o"|"p".("x"|"y"|"z")
after : "jKL"."mnop".("x"|"y"|"z")
After removing comments and making the printf
changes:
awk '
BEGIN { FS=OFS="." }
{ for (i=1; i<=NF; i++)
if ( $i !~ "^[(].*[)]$" )
$i="\"" gensub(/"|\|/,"","g",$i) "\""
print
}
' pipes.dat
Which generates:
("a"|"b"|"c")."ABC".("e"|"f")."EF"
"jKL"."mnop".("x"|"y"|"z")
Answered By - markp-fuso Answer Checked By - Katrina (WPSolving Volunteer)