Issue
Okay, so I've got over a hundred JSON files with predictable bad formatting in several places per file.
Instead of using [ ]
to indicate an array, they use { }
instead.
For example:
"grid": {
"C1", "D1", "E1", "C2", "D2", "E2", "F2", "B3", "C3", "D3", "E3", "F3", "B4", "C4", "D4", "E4", "F4", "C5", "D5", "E5", "F5", "C6", "D6", "E6"
},
Each file has multiple arrays in it with this problem, each with a different key.
I came up with this to fix the above example, but it isn't very universal:
sed 's/^\t\t"grid": {/^\t\t"grid: [/; s/"E6" },$/"E6" ],/' myfile.json
I also tried writing a more complicated awk script, something along these lines:
awk -i '/grid/ { gsub("{",{["); gsub("}","]") print $0 }' myfile.json
But it replaced the contents of myfile.json to be only the row that contained the string "grid".
Is there a reliable one-liner to fix this issue?
Solution
I propose following GNU AWK
solution, let file.json
content be
{"hello": 1,
"grid": {"C1", "D1", "E1", "C2", "D2", "E2", "F2", "B3", "C3", "D3", "E3", "F3", "B4", "C4", "D4", "E4", "F4", "C5", "D5", "E5", "F5", "C6", "D6", "E6"},
"something": "else"}
then
awk 'BEGIN{FPAT=".";OFS=""}/grid/&&match($0,/\{[^}]*\}/){$RSTART="[";$(RSTART+RLENGTH-1)="]"}{print}' file.json
gives output
{"hello": 1,
"grid": ["C1", "D1", "E1", "C2", "D2", "E2", "F2", "B3", "C3", "D3", "E3", "F3", "B4", "C4", "D4", "E4", "F4", "C5", "D5", "E5", "F5", "C6", "D6", "E6"],
"something": "else"}
Explanation: firstly I inform GNU AWK
that field is any single character (.
) and output field separator (OFS
) is empty string (without that there would be unwanted spaces in output) then for each line with grid
in it and containing literal {
followed by zero or more (*
) non (^
) }
and literal }
, I replace first ($RSTART
) character of what was matched using [
and last ($(RSTART+RLENGTH-1)
) character of what was matched using ]
, for each line, altered or not, I print
it. Note that I use match
function rather than using just regular expression as I then use RSTART
and RLENGTH
which are set by this variable. Note that return value of match
is used as part of condition so if there will be grid
in line but not {
...}
then said line will remain unchanged.
(tested in gawk 4.2.1)
Answered By - Daweo Answer Checked By - Cary Denson (WPSolving Admin)