Issue
I have to read ids from one file and search it in second xml file, if found write that entire line to third file. file1 is 111 MB, file2 is 40 GB
File1.xml
id1
id2
id5
File2.xml
<employees>
<employee><id>id1</id><name>test1</name></employee>
<employee><id>id2</id><name>test2</name></employee>
<employee><id>id3</id><name>test3</name></employee>
<employee><id>id4</id><name>test4</name></employee>
<employee><id>id5</id><name>test5</name></employee>
<employee><id>id6</id><name>test6</name></employee>
</employees>
File3.xml : result
<employee><id>id1</id><name>test1</name></employee>
<employee><id>id2</id><name>test2</name></employee>
<employee><id>id5</id><name>test5</name></employee>
i tried it with grep
grep -i -f file1.xml file2.xml >> file3.xml
but its giving memory exhausted error.
Another way i tried it with loop and awk command.
#while read -r id;do
#awk -v pat="$id" '$0~pat' file2.xml >> file3.xml
#done < file1.xml
its also taking too much time. What could be the best optimal solution for this.
Solution
This should work in any awk version:
awk 'FNR == NR {
seen["<id>" $1 "</id>"]
next
}
match($0, /<id>[^<]*<\/id>/) && substr($0, RSTART, RLENGTH) in seen
' file1 file2
<employee><id>id1</id><name>test1</name></employee>
<employee><id>id2</id><name>test2</name></employee>
<employee><id>id5</id><name>test5</name></employee>
Answered By - anubhava Answer Checked By - Timothy Miller (WPSolving Admin)