Issue
I have a huge csv file, about 500Mb. The field separator is the pipe char (|). Some lines contain newlines and carriage returns. I need to remove them, but I don't want to remove the legitimate newlines at the end of each lines. I have tried with sed and tr as shown in other questions but I end up removing all newlines, which is not what I want.
Sample input (the ||Yes line should be joined to the previous line):
21/06/2016 18:06:32|||||||||||||||||||32 Red|Jrup Vej 6|61069849
||Yes||vals
21/06/2016 18:06:32|||||||||||||||||||101 K|Ser Bevard 110|||No|
My attempts:
sed -i 's/\r\n//g' myfile.csv
tr -d '\r' < myfile.csv
Thanks for any help, Joe
Solution
A csv data usually has fixed number of columns.
In your case it seems a full record has 25 columns. Based on that you can use this awk command to join broken records:
awk -F '|' 'NF < 25 {getline s; $0 = $0 s} 1' file
21/06/2016 18:06:32|||||||||||||||||||32 Red|Jrup Vej 6|61069849||Yes||vals
21/06/2016 18:06:32|||||||||||||||||||101 K|Ser Bevard 110|||No|
Answered By - anubhava Answer Checked By - David Goodson (WPSolving Volunteer)