Monday, December 13, 2021

[SOLVED] jq parse json with stream flag into different json file

December 13, 2021 jq, json, linux, shell

Issue

I have a json file as below called data.json, I want to parse the data with jq tool in streaming mode(do not load the whole file into memory), because the real data have 20GB

the streaming mode in jq seems to add a flag --stream and it will parse the json file row by row

{
  "id": {
    "bioguide": "E000295",
    "thomas": "02283",
    "govtrack": 412667,
    "opensecrets": "N00035483",
    "lis": "S376"
  },
  "bio": {
    "gender": "F",
    "birthday": "1970-07-01"
  },
  "tooldatareports": [
    {
      "name": "A",
      "tooldata": [
        {
          "toolid": 12345,
          "data": [
            {
              "time": "2021-01-01",
              "value": 1
            },
            {
              "time": "2021-01-02",
              "value": 10
            },
            {
              "time": "2021-01-03",
              "value": 5
            }
          ]
        },
        {
          "toolid": 12346,
          "data": [
            {
              "time": "2021-01-01",
              "value": 10
            },
            {
              "time": "2021-01-02",
              "value": 100
            },
            {
              "time": "2021-01-03",
              "value": 50
            }
          ]
        }
      ]
    }
  ]
}

The final result I hope it can become as below

A list contains two dict, each dict contain 2 keys

[
  {
    "data": [
      {
        "time": "2021-01-01",
        "value": 1
      },
      {
        "time": "2021-01-02",
        "value": 10
      },
      {
        "time": "2021-01-03",
        "value": 5
      }
    ]
  },
  {
    "data": [
      {
        "time": "2021-01-01",
        "value": 10
      },
      {
        "time": "2021-01-02",
        "value": 100
      },
      {
        "time": "2021-01-03",
        "value": 50
      }
    ]
  }
]

For this problem, I use the below command line to get a result, but it still has some differences.

cat data.json | jq --stream 'select(.[0][0]=="tooldatareports" and .[0][2]=="tooldata" and .[1]!=null) | .'

the result is not a list contain a lot of dict
for each time and value are separate in the different list

Does anyone have any idea about this?

Solution

Here's a solution that does not use truncate_stream:

jq -n --stream '
 [fromstream( 
   inputs
   | (.[0] | index("data")) as $ix
   | select($ix)
   | .[0] |= .[$ix:] )]
' input.json

Answered By - peak

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Monday, December 13, 2021

[SOLVED] jq parse json with stream flag into different json file

Issue

Solution

Popular Posts

Labels