Saturday, November 13, 2021

[SOLVED] How to (optimally) pick a single normalized random word from a file with bash / sed / shuf?

Issue

I'm looking to remove any non-alphabetic (English) characters and make the output lower-case from /usr/share/dict/words. Here's what I have so far:

sed "$(shuf -i "1-$(cat /usr/share/dict/words | wc -l)" -n 1)q;d" /usr/share/dict/words | tr '[:upper:]' '[:lower:]' | sed 's/[^-a-z]//g'

This works fine but is it possible to do it all in the one sed command?


EDIT: The American word file looks like this:

A
A's
AMD
AMD's
AOL
AOL's
AWS
AWS's
Aachen
Aachen's

I'm looking to make this lower-case and remove any non-alphabetic characters (as mentioned in my original question). The solution I have works fine but I'm hoping to reduce the number of commands (maybe just sed?). Output of the above would then be:

a
as
amd
amds
aol
aols
aws
awss
aachen
aachens

Solution

You don't need sed and wc -- shuf can shuffle the lines of a file.
tr can remove non-alphas, so again don't need sed

shuf -n1 /usr/share/dict/words | tr -dc '[:alpha:]' | tr '[:upper:]' '[:lower:]'


Answered By - glenn jackman