Issue
I am looking for a way to split a string in bash over a delimiter string, and place the parts in an array.
Simple case:
#!/bin/bash
b="aaaaa/bbbbb/ddd/ffffff"
echo "simple string: $b"
IFS='/' b_split=($b)
echo ;
echo "split"
for i in ${b_split[@]}
do
echo "------ new part ------"
echo "$i"
done
Gives output
simple string: aaaaa/bbbbb/ddd/ffffff
split
------ new part ------
aaaaa
------ new part ------
bbbbb
------ new part ------
ddd
------ new part ------
ffffff
More complex case:
#!/bin/bash
c=$(echo "AA=A"; echo "B=BB"; echo "======="; echo "C==CC"; echo "DD=D"; echo "======="; echo "EEE"; echo "FF";)
echo "more complex string"
echo "$c";
echo ;
echo "split";
IFS='=======' c_split=($c) ;# <---- LINE TO BE CHANGED
for i in ${c_split[@]}
do
echo "------ new part ------"
echo "$i"
done
Gives output:
more complex string
AA=A
B=BB
=======
C==CC
DD=D
=======
EEE
FF
split
------ new part ------
AA
------ new part ------
A
B
------ new part ------
BB
------ new part ------
------ new part ------
------ new part ------
------ new part ------
------ new part ------
------ new part ------
------ new part ------
C
------ new part ------
------ new part ------
CC
DD
------ new part ------
D
------ new part ------
------ new part ------
------ new part ------
------ new part ------
------ new part ------
------ new part ------
------ new part ------
EEE
FF
I would like the second output to be like
------ new part ------
AA=A
B=BB
------ new part ------
C==CC
DD=D
------ new part ------
EEE
FF
I.e. to split the string on a sequence of characters, instead of one. How can I do this?
I am looking for an answer that would only modify this line in the second script:
IFS='=======' c_split=($c) ;# <---- LINE TO BE CHANGED
Solution
Introduction
At bottom of this, you will find a function to transform string to an array with following syntax:
ssplit "<string>" "<array name>" "<delimiter string>"
For this:
ssplit "$c" c_split $'\n=======\n'
declare -p c_split
declare -a c_split=([0]=$'AA=A\nB=BB' [1]=$'C==CC\nDD=D' [2]=$'EEE\nFF')
IFS
disambiguation
IFS
mean Input Field Separators, as list of characters that could be used as separators
.
By default, this is set to
\t\n
, meaning that any number (greater than zero) of space, tabulation and/or newline could be one separator
.
So with the string: $' blah foo=bar \nbaz '
read -a c_split <<<" blah foo=bar
baz "
declare -p c_split
declare -a c_split=([0]="blah" [1]="foo=bar")
Leading and trailing separators would be ignored and this string will contain only 3
parts: blah
, foo=bar
and baz
.
But except for spaces, IFS consider each separator for itself:
IFS=Z read a b c d e f <<<ZaZZbZcZZdZeZf
declare -p a b c d e f
declare -- a=""
declare -- b="a"
declare -- c=""
declare -- d="b"
declare -- e="c"
declare -- f="ZdZeZf"
Splitting a string using IFS
is possible if you know a valid field separator not used in your string, so you could replace your pattern by this character (by using ${var//<pattern>/<separator>}
syntax):
OIFS="$IFS"
IFS='§'
c=$'AA=A\nB=BB\n=======\nC==CC\nDD=D\n=======\nEEE\nFF'
c_split=(${c//=======/§})
IFS="$OIFS"
printf -- "------ new part ------\n%s\n" "${c_split[@]}"
------ new part ------
AA=A
B=BB
------ new part ------
C==CC
DD=D
------ new part ------
EEE
FF
But this work only while string do not contain any §
.
You could use another character, like IFS=$'\026';c_split=(${c//=======/$'\026'})
but anyway this may involve furter bugs.
You could browse character maps for finding one who's not in your string:
myIfs=""
for i in {1..255};do
printf -v char "$(printf "\\\%03o" $i)"
[ "$c" == "${c#*$char}" ] && myIfs="$char" && break
done
if ! [ "$myIFS" ] ;then
echo no split char found, could not do the job, sorry.
exit 1
fi
but I find this solution a little overkill.
Splitting on spaces (or without modifying IFS)
Under bash, we could use this bashism:
b="aaaaa/bbbbb/ddd/ffffff"
b_split=(${b//// })
In fact, this syntaxe ${varname//
will initiate a translation (delimited by /
) replacing all occurences of /
by a space
, before assigning it to an array b_split
.
Of course, this still use IFS
and split array on spaces.
This is not the best way, but could work with specific cases.
You could even drop unwanted spaces before splitting:
b='12 34 / 1 3 5 7 / ab'
b1=${b// }
b_split=(${b1//// })
printf "<%s>, " "${b_split[@]}" ;echo
<12>, <34>, <1>, <3>, <5>, <7>, <ab>,
or exchange thems...
b1=${b// /§}
b_split=(${b1//// })
printf "<%s>, " "${b_split[@]//§/ }" ;echo
<12 34 >, < 1 3 5 7 >, < ab>,
Splitting line on delimiter strings
:
So you have to not use IFS
for your meaning, but bash do have nice features:
#!/bin/bash
c=$'AA=A\nB=BB\n=======\nC==CC\nDD=D\n=======\nEEE\nFF'
echo "more complex string"
echo "$c";
echo ;
echo "split";
mySep='======='
while [ "$c" != "${c#*$mySep}" ];do
echo "------ new part ------"
echo "${c%%$mySep*}"
c="${c#*$mySep}"
done
echo "------ last part ------"
echo "$c"
Let see:
more complex string
AA=A
B=BB
=======
C==CC
DD=D
=======
EEE
FF
split
------ new part ------
AA=A
B=BB
------ new part ------
C==CC
DD=D
------ last part ------
EEE
FF
About Leading newline
Leading and trailing newlines are not deleted in previous samples. For this, you could simply:
mySep=$'\n=======\n'
instead of =======
.
Or you could rewrite split loop for keeping explicitely this out:
mySep=$'======='
while [ "$c" != "${c#*$mySep}" ];do
echo "------ new part ------"
part="${c%%$mySep*}"
part="${part##$'\n'}"
echo "${part%%$'\n'}"
c="${c#*$mySep}"
done
echo "------ last part ------"
c=${c##$'\n'}
echo "${c%%$'\n'}"
Any case, this match what SO question asked for (: and his sample :)
------ new part ------
AA=A
B=BB
------ new part ------
C==CC
DD=D
------ last part ------
EEE
FF
Finaly creating an array
.
#!/bin/bash
c=$'AA=A\nB=BB\n=======\nC==CC\nDD=D\n=======\nEEE\nFF'
echo "more complex string"
echo "$c";
echo ;
echo "split";
mySep=$'======='
export -a c_split
while [ "$c" != "${c#*$mySep}" ];do
part="${c%%$mySep*}"
part="${part##$'\n'}"
c_split+=("${part%%$'\n'}")
c="${c#*$mySep}"
done
c=${c##$'\n'}
c_split+=("${c%%$'\n'}")
for i in "${c_split[@]}"
do
echo "------ new part ------"
echo "$i"
done
Do this finely:
more complex string
AA=A
B=BB
=======
C==CC
DD=D
=======
EEE
FF
split
------ new part ------
AA=A
B=BB
------ new part ------
C==CC
DD=D
------ new part ------
EEE
FF
Some explanations:
export -a var
to definevar
as an array and share them in childs${variablename%string*}
,${variablename%%string*}
result in the left part of variablename, upto but without string. One%
mean last occurence of string and%%
for all occurences. Full variablename is returned is string not found.${variablename#*string}
, do same in reverse way: return last part of variablename from but without string. One#
mean first occurence and two##
man all occurences.
Nota in replacement, character *
is a joker mean any number of any character.
The command echo "${c%%$'\n'}"
would echo variable c but without any number of newline at end of string.
So if variable contain Hello WorldZorGluBHello youZorGluBI'm happy
,
variable="Hello WorldZorGluBHello youZorGluBI'm happy"
$ echo ${variable#*ZorGluB}
Hello youZorGlubI'm happy
$ echo ${variable##*ZorGluB}
I'm happy
$ echo ${variable%ZorGluB*}
Hello WorldZorGluBHello you
$ echo ${variable%%ZorGluB*}
Hello World
$ echo ${variable%%ZorGluB}
Hello WorldZorGluBHello youZorGluBI'm happy
$ echo ${variable%happy}
Hello WorldZorGluBHello youZorGluBI'm
$ echo ${variable##* }
happy
All this is explained in the manpage:
$ man -Len -Pless\ +/##word bash
$ man -Len -Pless\ +/%%word bash
$ man -Len -Pless\ +/^\\\ *export\\\ .*word bash
Step by step, the splitting loop:
The separator:
mySep=$'======='
Declaring c_split
as an array (and could be shared with childs)
export -a c_split
While variable c do contain at least one occurence of mySep
while [ "$c" != "${c#*$mySep}" ];do
Trunc c from first mySep
to end of string and assign to part
.
part="${c%%$mySep*}"
Remove leading newlines
part="${part##$'\n'}"
Remove trailing newlines and add result as a new array element to c_split
.
c_split+=("${part%%$'\n'}")
Reassing c whith the rest of string when left upto mySep
is removed
c="${c#*$mySep}"
Done ;-)
done
Remove leading newlines
c=${c##$'\n'}
Remove trailing newlines and add result as a new array element to c_split
.
c_split+=("${c%%$'\n'}")
Into a function:
ssplit() {
local string="$1" array=${2:-ssplited_array} delim="${3:- }" pos=0
while [ "$string" != "${string#*$delim}" ];do
printf -v $array[pos++] "%s" "${string%%$delim*}"
string="${string#*$delim}"
done
printf -v $array[pos] "%s" "$string"
}
Usage:
ssplit "<quoted string>" [array name] [delimiter string]
where array name is $splitted_array
by default and delimiter is one single space.
You could use:
c=$'AA=A\nB=BB\n=======\nC==CC\nDD=D\n=======\nEEE\nFF'
ssplit "$c" c_split $'\n=======\n'
printf -- "--- part ----\n%s\n" "${c_split[@]}"
--- part ----
AA=A
B=BB
--- part ----
C==CC
DD=D
--- part ----
EEE
FF
Answered By - F. Hauri - Give Up GitHub Answer Checked By - Marilyn (WPSolving Volunteer)