Issue
So I have tried to write a Bash script to extract the 10 most common double-vowels words from a file, like good, teeth, etc. Here is what I have so far:
grep -E -o '[aeiou]{2}' $1|tr 'A-Z' 'a-z' |sort|uniq -c|sort -n | tail -10
I tried to use grep with flag E, then find the pattern match, such as 'aa', 'ee', 'ii' , etc, but it is not working at all, enter image description here, what I got back, just 'ai', 'ea', something like this. Can anyone help me figure how to do pattern match in bash script?
Solution
You can simply match any amount of letters before or after a repeated vowel with this POSIX ERE regex with a GNU grep
:
grep -oE '[[:alpha:]]*([aeiou])\1[[:alpha:]]*' words.txt
FreeBSD (non-GNU) grep does not support a backreference in the pattern, so you will have to list all possible vowel sequences:
grep -oE '[[:alpha:]]*(aa|ee|ii|oo|uu)[[:alpha:]]*' words.txt
See the online demo:
#!/bin/bash
s='Some good feed
Soot and weed'
grep -oE '[[:alpha:]]*([aeiou])\1[[:alpha:]]*' <<< "$s"
Details:
[[:alpha:]]*
- zero or more letters(aa|ee|ii|oo|uu)
- one of the char sequences,aa
,ee
,ii
,oo
oruu
(|
is an alternation operator in a POSIX ERE regex)([aeiou])
- Group 1: a vowel\1
- the same vowel as in Group 1[[:alpha:]]*
- zero or more letters
See the diagram:
Answered By - Wiktor Stribiżew Answer Checked By - Mary Flores (WPSolving Volunteer)