Thursday, March 17, 2022

[SOLVED] Split large csv file into multiple files and keep header in each part

Issue

How to split a large csv file (1GB) into multiple files (say one part with 1000 rows, 2nd part 10000 rows, 3rd part 100000, etc) and preserve the header in each part ?

How can I achieve this

h1 h2
a  aa
b  bb
c  cc
.
.
12483720 rows

into

h1 h2
a  aa
b  bb
.
.
.
1000 rows

And

h1 h2
x  xx
y  yy
.
.
.
10000 rows

Solution

Another awk. First some test records:

$ seq 1 1234567 > file

Then the awk:

$ awk 'NR==1{n=1000;h=$0}{print > n}NR==n+c{n*=10;c=NR-1;print h>n}' file

Explained:

$ awk '
NR==1 {           # first record:
    n=1000        # set first output file size and
    h=$0          # store the header
}
{
    print > n     # output to file
}
NR==n+c {         # once target NR has been reached. close(n) goes here if needed
    n*=10         # grow target magnitude
    c=NR-1        # set the correction factor. 
    print h > n   # first the head
}' file

Count the records:

$ wc -l 1000*
   1000 1000
  10000 10000
 100000 100000
1000000 1000000
 123571 10000000
1234571 total


Answered By - James Brown
Answer Checked By - Terry (WPSolving Volunteer)