Issue
The goal is to compare a JSON file against a "key" of standard values and add those values to objects in another JSON file if certain strings match. The purpose is to merge two sets of analytics that have complementary data.
The condition I have been trying to match on is when href
from index-of-pages.json includes the string in url
in key.json.
index-of-pages.json
[
{
"href": "articles/guide1/page1.html",
"name": "Page 1",
"views": "204"
},
{
"href": "articles/guide2/page2.html",
"name": "Page 2",
"views": "180"
},
{
"href": "articles/guide2/page3.html",
"name": "Page 3",
"views": "121"
},
{
"href": "apis/apiguide1/subguide1/page4.html",
"name": "Page 4",
"views": "101"
},
{
"href": "apis/apiguide2/subguide2/page5.html",
"name": "Page 5",
"views": "103"
},
{
"href": "articles/guide1/about.html",
"name": "Page 6",
"views": "103"
},
{
"href": "index.html",
"name": "Page 7",
"views": "400"
}
]
key.json
[
{
"url": "/guide1/",
"guide": "Guide 1",
"tag": "how-to"
},
{
"url": "/guide2/",
"guide": "Guide 2",
"tag": "how-to"
},
{
"url": "/apiguide1/subguide1/",
"guide": "API Guide 1",
"subguide": "Subguide 1",
"tag": "api"
},
{
"url": "/guide1/about",
"guide": "Guide 1",
"tag": "about"
}
]
Note there is no trailing slash on url
in the last object.
Desired result:
[
{
"href": "articles/guide1/page1.html",
"name": "Page 1",
"views": "204",
"url": "/guide1/",
"guide": "Guide 1",
"tag": "how-to"
},
{
"href": "articles/guide2/page2.html",
"name": "Page 2",
"views": "180",
"url": "/guide2/",
"guide": "Guide 2",
"tag": "how-to"
},
{
"href": "articles/guide2/page3.html",
"name": "Page 3",
"views": "121"
},
{
"href": "apis/apiguide1/subguide1/page4.html",
"name": "Page 4",
"views": "101",
"url": "/apiguide1/",
"guide": "API Guide 1",
"subguide": "Subguide 1",
"tag": "api"
},
{
"href": "apis/apiguide2/subguide2/page5.html",
"name": "Page 5",
"views": "103"
},
{
"href": "articles/guide1/about.html",
"name": "Page 6",
"views": "103",
"url": "/guide1/about",
"guide": "Guide 1",
"tag": "about"
},
{
"href": "index.html",
"name": "Page 7",
"views": "400"
}
]
Objects in index-of-files.json that do not match anything in the key would still be included in the desired output.
Whether it is desirable for all keys to be included in the output objects even when they are empty, I'm not sure what is best practice.
This has brought me closest, but I cannot figure out how to incorporate a step to match on the key:
jq --argfile uid key.json '
($uid | INDEX(.url)) as $dict
| map( $dict[.href] + del(.href) )
' index-of-files.json
Other attempts such as the following do not result in a 1:1 match of objects; rather, it produces a huge list of every possible combination of every key (the output was nested so I labeled it key
; all desired output keys are not shown in this script):
(.[].href/"/"?|{key: ("/" + .[-2] + "/")}) as $abc | {name: .[].name, level: $abc}
I have also tried variations on while if
loops with no success:
jq -r '.[] | "\(.url)|\(.guide)|\(.tag)|\(.subguide)"' key.json |
while IFS="|" read -r url guide tag subguide; do
cat index-of-files.json | jq --arg url "$url" --arg guide "$guide" --arg subguide "$subguide" '.[] | if (.href | contains('\"$url\"')) then . + {guide: '\"$guide\"', tag: '\"$tag\"', subguide: '\"$subguide\"'} else . end'
done
Thank you for any insight or guidance.
Solution
I don't think INDEX
can help here.
What I'd do instead is this:
sort_by(.url | -length) as $c | inputs | map(. + (.href as $s | first($c[] | select(.url as $ss | $s | index($ss))) // {}))
In case it's unclear, the JQ invocation will look like so:
jq '...' key.json index-of-pages.json
Answered By - oguz ismail Answer Checked By - Willingham (WPSolving Volunteer)