Friday, May 27, 2022

[SOLVED] Bash script to extract 10 most common double-vowels word form a file

Issue

So I have tried to write a Bash script to extract the 10 most common double-vowels words from a file, like good, teeth, etc. Here is what I have so far:

grep -E -o '[aeiou]{2}' $1|tr 'A-Z' 'a-z' |sort|uniq -c|sort -n | tail -10

I tried to use grep with flag E, then find the pattern match, such as 'aa', 'ee', 'ii' , etc, but it is not working at all, enter image description here, what I got back, just 'ai', 'ea', something like this. Can anyone help me figure how to do pattern match in bash script?


Solution

You can simply match any amount of letters before or after a repeated vowel with this POSIX ERE regex with a GNU grep:

grep -oE '[[:alpha:]]*([aeiou])\1[[:alpha:]]*' words.txt

FreeBSD (non-GNU) grep does not support a backreference in the pattern, so you will have to list all possible vowel sequences:

grep -oE '[[:alpha:]]*(aa|ee|ii|oo|uu)[[:alpha:]]*' words.txt

See the online demo:

#!/bin/bash
s='Some good feed
Soot and weed'
grep -oE '[[:alpha:]]*([aeiou])\1[[:alpha:]]*' <<< "$s"

Details:

  • [[:alpha:]]* - zero or more letters
  • (aa|ee|ii|oo|uu) - one of the char sequences, aa, ee, ii, oo or uu (| is an alternation operator in a POSIX ERE regex)
  • ([aeiou]) - Group 1: a vowel
  • \1 - the same vowel as in Group 1
  • [[:alpha:]]* - zero or more letters

See the diagram:

enter image description here



Answered By - Wiktor Stribiżew
Answer Checked By - Mary Flores (WPSolving Volunteer)