Issue
I have a list of sentences (one sentence per line), and a dictionary (a list of words, one word per line). I want to use awk, grep or sed to edit the sentences file such that only the words that are in my dictionary file are kept. For example, dictionary:
hello
dog
lost
I
miss
computer
buy
input file:
I miss my dog
I want to buy a new computer
result:
I miss dog
I buy computer
I know this can be done easily with Python but im trying to use the terminal commands (awk, sed, grep, or any other terminal command).
Thank you.
Solution
In Python I would just read the word list file, create a list of strings with the words, then read the input file and output the word if it exists in the array.
And that's how you'd do in in awk
too:
$ awk 'FNR == NR { dict[$0] = 1; next } # Read the dictionary file
{ # And for each word of each line of the sentence file
for (word = 1; word <= NF; word++) {
if ($word in dict) # See if it's in the dictionary
printf "%s ", $word
}
printf "\n"
}' dict.txt input.txt
I miss dog
I buy computer
(This does leave a trailing space on each line, but that's easy to filter out if it matters)
Answered By - Shawn Answer Checked By - David Goodson (WPSolving Volunteer)