Monday, October 25, 2021

[SOLVED] How to use sed to substitute LF with space, but not CRLF?

Issue

I have a csv file, that has a mix of CRLF and LF. At some points there is a LF, where in fact the content belongs to the line before.

Example:

smith;pete;he is very nice;1990CRLF
brown;mark;he is very nice;2010CRLF
taylor;sam;he isLF
very nice;2009CRLF

In my script, I want to remove all standalone instances of LF. I tried using sed:

sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/ /g' $my_file

The problem with this solution is that the LFs belonging to the CRLFs, also get substituted with a space character.


Solution

With perl which doesn't remove record separator by default - and thus allowing easy manipulation

$ cat -A ip.txt
smith;pete;he is very nice;1990^M$
brown;mark;he is very nice;2010^M$
taylor;sam;he is$
very nice;2009^M$

$ perl -pe 's/(?<!\r)\n/ /' ip.txt
smith;pete;he is very nice;1990
brown;mark;he is very nice;2010
taylor;sam;he is very nice;2009

$ perl -pe 's/(?<!\r)\n/ /' ip.txt | cat -A
smith;pete;he is very nice;1990^M$
brown;mark;he is very nice;2010^M$
taylor;sam;he is very nice;2009^M$

(?<!\r)\n uses negative look-behind to ensure that we replace \n only when it is not preceded by \r


Modifying OP's attempt:

$ sed -e ':a' -e 'N' -e '$!ba' -e 's/\([^\r]\)\n/\1 /g' ip.txt
smith;pete;he is very nice;1990
brown;mark;he is very nice;2010
taylor;sam;he is very nice;2009

\([^\r]\) to ensure character preceding \n is not \r



Answered By - Sundeep