Issue
I have a csv file, that has a mix of CRLF and LF. At some points there is a LF, where in fact the content belongs to the line before.
Example:
smith;pete;he is very nice;1990CRLF
brown;mark;he is very nice;2010CRLF
taylor;sam;he isLF
very nice;2009CRLF
In my script, I want to remove all standalone instances of LF. I tried using sed:
sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/ /g' $my_file
The problem with this solution is that the LFs belonging to the CRLFs, also get substituted with a space character.
Solution
With perl
which doesn't remove record separator by default - and thus allowing easy manipulation
$ cat -A ip.txt
smith;pete;he is very nice;1990^M$
brown;mark;he is very nice;2010^M$
taylor;sam;he is$
very nice;2009^M$
$ perl -pe 's/(?<!\r)\n/ /' ip.txt
smith;pete;he is very nice;1990
brown;mark;he is very nice;2010
taylor;sam;he is very nice;2009
$ perl -pe 's/(?<!\r)\n/ /' ip.txt | cat -A
smith;pete;he is very nice;1990^M$
brown;mark;he is very nice;2010^M$
taylor;sam;he is very nice;2009^M$
(?<!\r)\n
uses negative look-behind to ensure that we replace \n
only when it is not preceded by \r
Modifying OP's attempt:
$ sed -e ':a' -e 'N' -e '$!ba' -e 's/\([^\r]\)\n/\1 /g' ip.txt
smith;pete;he is very nice;1990
brown;mark;he is very nice;2010
taylor;sam;he is very nice;2009
\([^\r]\)
to ensure character preceding \n
is not \r
Answered By - Sundeep