Issue
I have a json file as below called data.json, I want to parse the data with jq tool in streaming mode(do not load the whole file into memory), because the real data have 20GB
the streaming mode in jq seems to add a flag --stream and it will parse the json file row by row
{
"id": {
"bioguide": "E000295",
"thomas": "02283",
"govtrack": 412667,
"opensecrets": "N00035483",
"lis": "S376"
},
"bio": {
"gender": "F",
"birthday": "1970-07-01"
},
"tooldatareports": [
{
"name": "A",
"tooldata": [
{
"toolid": 12345,
"data": [
{
"time": "2021-01-01",
"value": 1
},
{
"time": "2021-01-02",
"value": 10
},
{
"time": "2021-01-03",
"value": 5
}
]
},
{
"toolid": 12346,
"data": [
{
"time": "2021-01-01",
"value": 10
},
{
"time": "2021-01-02",
"value": 100
},
{
"time": "2021-01-03",
"value": 50
}
]
}
]
}
]
}
The final result I hope it can become as below
A list contains two dict, each dict contain 2 keys
[
{
"data": [
{
"time": "2021-01-01",
"value": 1
},
{
"time": "2021-01-02",
"value": 10
},
{
"time": "2021-01-03",
"value": 5
}
]
},
{
"data": [
{
"time": "2021-01-01",
"value": 10
},
{
"time": "2021-01-02",
"value": 100
},
{
"time": "2021-01-03",
"value": 50
}
]
}
]
For this problem, I use the below command line to get a result, but it still has some differences.
cat data.json | jq --stream 'select(.[0][0]=="tooldatareports" and .[0][2]=="tooldata" and .[1]!=null) | .'
- the result is not a list contain a lot of dict
- for each time and value are separate in the different list
Does anyone have any idea about this?
Solution
Here's a solution that does not use truncate_stream
:
jq -n --stream '
[fromstream(
inputs
| (.[0] | index("data")) as $ix
| select($ix)
| .[0] |= .[$ix:] )]
' input.json
Answered By - peak