Issue
I have two directories in my linux system, /dir
and /dir2
Both have more than 4000 JSON files. The JSON content of every file is like
{
"someattribute":"someValue",
"url":[
"https://www.someUrl.com/xyz"
],
"someattribute":"someValue"
}
Note that url is an array, but it always contains one element (the url).
The url makes the file unique. If there is a file with the same url in /dir
and /dir2
then it's a duplicate and it needs to be deleted.
I want to automate this operation either using a shell command preferrably. Any opinion how I should go about it?
Solution
Use jq to get a list of duplicates:
jq -nrj '[
foreach inputs.url as [$url] ({};
.[$url] += 1;
if .[$url] > 1 then input_filename
else empty end
)
] | join("\u0000")' /{dir1,dir2}/*.json
And to remove them, pipe above command's output to xargs:
xargs -0 rm --
Answered By - oguz ismail Answer Checked By - Marilyn (WPSolving Volunteer)