Saturday, March 12, 2022

[SOLVED] Convert JSON to CSV Python - Command Line

Issue

I'm a novice at coding, so forgive me but sometimes I need a step-by-step walkthrough. I have a large JSON file (544 MB) that I need to convert to CSV. I have code from another forum to run in VS Code (below). But now I'm not sure what to type in Terminal to actually do the conversion for multiple similar files I need to convert.

Here's the code I have, json-convert.py:

from copy import deepcopy
import pandas
import json


def cross_join(left, right):
    new_rows = [] if right else left
    for left_row in left:
        for right_row in right:
            temp_row = deepcopy(left_row)
            for key, value in right_row.items():
                temp_row[key] = value
            new_rows.append(deepcopy(temp_row))
    return new_rows


def flatten_list(data):
    for elem in data:
        if isinstance(elem, list):
            yield from flatten_list(elem)
        else:
            yield elem


def json_to_dataframe(data_in):
    def flatten_json(data, prev_heading=''):
        if isinstance(data, dict):
            rows = [{}]
            for key, value in data.items():
                rows = cross_join(rows, flatten_json(value, prev_heading + '.' + key))
        elif isinstance(data, list):
            rows = []
            for i in range(len(data)):
                [rows.append(elem) for elem in flatten_list(flatten_json(data[i], prev_heading))]
        else:
            rows = [{prev_heading[1:]: data}]
        return rows

    return pandas.DataFrame(flatten_json(data_in))

if __name__ == '__main__':
    f = open('pretty-202009.json')
    json_data = json.load(f)
    df = json_to_dataframe(json_data)
    df.to_csv("flight_csv.csv", sep=',', encoding='utf-8')


    # run  in terminal for conversion to csv

Solution

I will presume that your functions are correct because this isn't the question you asked.

If you have a list of the multiple files why don't you do a for loop over them? Something like:

⋮
if __name__ == '__main__':
    files_to_process = ('pretty-202009.json', 'pretty-202010.json', 'pretty-202011.json')
    for order, current_file_name in enumerate(files_to_process):
        with open(current_file_name) as current_file:
            json_data = json.load(current_file)
            df = json_to_dataframe(json_data)
            df.to_csv(f'flight{order}_csv', sep=',', encoding='utf-8')


Answered By - EvensF
Answer Checked By - David Marino (WPSolving Volunteer)