Sunday, June 5, 2022

[SOLVED] Use grep or sed to keep only the words that are in another word list file

Issue

I have a list of sentences (one sentence per line), and a dictionary (a list of words, one word per line). I want to use awk, grep or sed to edit the sentences file such that only the words that are in my dictionary file are kept. For example, dictionary:

hello
dog
lost
I
miss
computer
buy

input file:

I miss my dog
I want to buy a new computer

result:

I miss dog
I buy computer

I know this can be done easily with Python but im trying to use the terminal commands (awk, sed, grep, or any other terminal command).

Thank you.


Solution

In Python I would just read the word list file, create a list of strings with the words, then read the input file and output the word if it exists in the array.

And that's how you'd do in in awk too:

$ awk 'FNR == NR { dict[$0] = 1; next } # Read the dictionary file
       { # And for each word of each line of the sentence file
         for (word = 1; word <= NF; word++) {
           if ($word in dict) # See if it's in the dictionary
             printf "%s ", $word
         }
         printf "\n"
       }' dict.txt input.txt
I miss dog
I buy computer

(This does leave a trailing space on each line, but that's easy to filter out if it matters)



Answered By - Shawn
Answer Checked By - David Goodson (WPSolving Volunteer)