Issue
I am trying to get all the word pairs out of a piece of text.
I have the following regular expression (\w+) +(\w+)
that I run on a piece of text with no punctuation. My issue is this does not consider all possible pairs
$ echo "hello dear world" | grep -Eoi "(\w+) +(\w+)"
hello dear
I want the following
$ echo "hello dear world" | grep -Eoi [some expression]
hello dear
dear world
Solution
Traditional grep
won't return capture groups.
You can consider pcregrep
with a lookahead and 2 capture groups:
echo "hello dear world" | pcregrep -o1 -o2 '(\w+)(?=(\h+\w+))'
hello dear
dear world
If you don't have pcregrep
then you can use this simple awk
:
awk '{for (i=1; i<NF; ++i) print $i OFS $(i+1)}' <<< "hello dear world"
hello dear
dear world
Answered By - anubhava