Friday, May 27, 2022

[SOLVED] Unix command to search file1 id in another file2 and write result to a file3

Issue

I have to read ids from one file and search it in second xml file, if found write that entire line to third file. file1 is 111 MB, file2 is 40 GB

File1.xml

id1
id2
id5

File2.xml

<employees>
<employee><id>id1</id><name>test1</name></employee>
<employee><id>id2</id><name>test2</name></employee>
<employee><id>id3</id><name>test3</name></employee>
<employee><id>id4</id><name>test4</name></employee>
<employee><id>id5</id><name>test5</name></employee>
<employee><id>id6</id><name>test6</name></employee>
</employees>

File3.xml : result

<employee><id>id1</id><name>test1</name></employee>
<employee><id>id2</id><name>test2</name></employee>
<employee><id>id5</id><name>test5</name></employee>

i tried it with grep

grep -i -f file1.xml file2.xml >> file3.xml

but its giving memory exhausted error.

Another way i tried it with loop and awk command.

#while read -r id;do
#awk  -v pat="$id" '$0~pat' file2.xml  >> file3.xml
#done < file1.xml

its also taking too much time. What could be the best optimal solution for this.


Solution

This should work in any awk version:

awk 'FNR == NR {
   seen["<id>" $1 "</id>"]
   next
}
match($0, /<id>[^<]*<\/id>/) && substr($0, RSTART, RLENGTH) in seen
' file1 file2

<employee><id>id1</id><name>test1</name></employee>
<employee><id>id2</id><name>test2</name></employee>
<employee><id>id5</id><name>test5</name></employee>


Answered By - anubhava
Answer Checked By - Timothy Miller (WPSolving Admin)