Issue
Introduction
Hi! I'm trying to extract JSon from a 300K line text file that has a combination of Text output and JSon format from HTTP Result. The big size in lines makes it unable to retain the JSon manually.
Problematic
Don't have much choice, i probably need to fix it manually using a command-line. Here's how it's looks like inside the file:
[2K 100.00% - C: 164148 / 164149 - S: 263 - F: 3686 - dhcp-140-247-148-215.fas.harvard.edu:443 - id3.sshws.me
[2K 100.00% - C: 164149 / 164149 - S: 263 - F: 3686 - public-1300503051.cos.ap-shanghai.myqcloud.com:443 - id3.sshws.me
[2K
[
{
"Request": {
"ProxyHost": "pro.ant.design",
"ProxyPort": 443,
"Bug": "pro.ant.design",
"Method": "HEAD",
"Target": "id3.sshws.me",
"Payload": "GET wss://pro.ant.design/ HTTP/1.1[crlf]Host: [host][crlf]Upgrade: websocket[crlf][crlf]"
},
"ResponseLine": [
"HTTP/1.1 101 Switching Protocol",
"Server: cloudflare"
]
},
{
"Request": {
"ProxyHost": "industrialtech.ft.com",
"ProxyPort": 443,
"Bug": "industrialtech.ft.com",
"Method": "HEAD",
"Target": "id3.sshws.me",
"Payload": "GET wss://industrialtech.ft.com/ HTTP/1.1[crlf]Host: [host][crlf]Upgrade: websocket[crlf][crlf]"
},
"ResponseLine": [
"HTTP/1.1 101 Switching Protocol",
"Server: cloudflare"
]
}
]
Several problem to this if using RegEx is:
It has multiple JSon object
The Text string that doesn't part of JSon has
[
and:
I realize the problem when trying to use sed
regex.
sed '/^[/,/^]/!d'
Solution
You can remove all lines that start with [
and any non-whitespace char:
sed '/^\[[^[:space:]]/d' file > newfile
Details:
^
- start of a line\[
-[
char[^[:space:]]
- any non-whitespace chars.
Answered By - Wiktor Stribiżew Answer Checked By - Mary Flores (WPSolving Volunteer)