Issue
I know I can use sed 's/[[:blank:]]/,/g'
to convert blank spaces into commas or anything of my choosing in my file, but is there a way to somehow set it so that, only the first 5 instances of whitespace convert them into a comma?
This is because my last column has a lot of information written out, so it is annoying when sed coverts all the spaces in that column into commas.
Sample input file:
sample1 gi|11| 123 33 97.23 This is a sentence
sample2 gi|22| 234 33 97.05 Keep these spaces
And the output I was looking for:
sample1,gi|11|,123,33,97.23,This is a sentence
sample2,gi|22|,234,33,97.05,Keep these spaces
Only the first 5 chains of whitespace are changed to a comma.
Solution
With GNU awk for the 3rd arg to match()
:
$ awk '{ match($0,/((\S+\s+){5})(.*)/,a); gsub(/\s+/,",",a[1]); print a[1] a[3] }' file
sample1,gi|11|,123,33,97.23,This is a sentence
sample2,gi|22|,234,33,97.05,This is a sentence
but I'd recommend you actually turn it into a valid CSV (i.e. one that conforms to RFC 4180) such as could be read by MS-Excel and other tools since "This is a sentence" (and possibly other fields) can presumably include commas and double quotes:
$ awk '{
gsub(/"/,"\"\"");
match($0,/((\S+\s+){5})(.*)/,a)
gsub(/\s+/,"\",\"",a[1])
print "\"" a[1] a[3] "\""
}' file
"sample1","gi|11|","123","33","97.23","This is a sentence"
"sample2","gi|22|","234","33","97.05","This is a sentence"
For example given this input:
$ cat file
sample1 gi|11| 123 33 97.23 This is a sentence
a,b,sample2 gi|22| 234 33 97.05 This is, "typically", a sentence
The output from the first script is not valid CSV:
$ awk '{ match($0,/((\S+\s+){5})(.*)/,a); gsub(/\s+/,",",a[1]); print a[1] a[3] }' file
sample1,gi|11|,123,33,97.23,This is a sentence
a,b,sample2,gi|22|,234,33,97.05,This is, "typically", a sentence
while the output from the 2nd script IS valid CSV:
$ awk '{ gsub(/"/,"\"\""); match($0,/((\S+\s+){5})(.*)/,a); gsub(/\s+/,"\",\"",a[1]); print "\"" a[1] a[3] "\"" }' file
"sample1","gi|11|","123","33","97.23","This is a sentence"
"a,b,sample2","gi|22|","234","33","97.05","This is, ""typically"", a sentence"
Answered By - Ed Morton