Thursday, October 13, 2022

[SOLVED] Bash - Read a csv file and print values by using the header name instead of positional values

October 13, 2022 bash

Issue

I'm using this script to read a csv and print specific values which works just fine. However I'm looking for a way that when my input file changes, (e.g. columns are added at random places or 2 existing columns are switched from position), my script would still work. A solution would be if I could use the header column names, not in a positional way as I'm doing now. However I can't find exactly how to do that. Anyone got an idea?

Input file:

db_name;db_country;db_email;db_phone;db_address
Hedwig Guthrie;Vietnam;[email protected];1-749-430-8866;"8087 Eget, Ave"
Mary Taylor;Vietnam;[email protected];1-221-754-0377;"146-561 Proin Rd."

#!/bin/bash

exec < input.csv
IFS=';'
read header

while read name country email phone address 
do
    echo "Name: " ${name}
    echo "country: " ${country}
    echo "E-mail: " ${email}
    echo "Telephone: " ${phone}
    echo "Address: " ${address}
    echo "========================"
done

Output:

Name:  Hedwig Guthrie
country:  Vietnam
E-mail:  [email protected]
Telephone:  1-749-430-8866
Address:  "8087 Eget, Ave"
========================
Name:  Mary Taylor
country:  Vietnam
E-mail:  [email protected]
Telephone:  1-221-754-0377
Address:  "146-561 Proin Rd."
========================

Solution

Using any awk and with lots of intermediate variables and meaningful variable names to hopefully make the code easy to understand (I'm using the term "tag" to mean the names of the input columns/output rows):

$ cat tst.sh
#!/usr/bin/env bash

awk -F ';' -v OFS=': ' '
    BEGIN {
        numOutTags = split("Name;country;E-mail;Telephone;Address",outNrs2outTags)
        split("db_name;db_country;db_email;db_phone;db_address",outNrs2inTags)
    }
    NR==1 {
        for ( inTagNr=1; inTagNr<=NF; inTagNr++ ) {
            inTag = $inTagNr
            inTags2inNrs[inTag] = inTagNr
        }
        next
    }
    {
        for ( outTagNr=1; outTagNr<=numOutTags; outTagNr++ ) {
            outTag  = outNrs2outTags[outTagNr]
            inTag   = outNrs2inTags[outTagNr]
            inTagNr = inTags2inNrs[inTag]
            print outTag, $inTagNr
        }
        print "========================"
    }
' "${@:--}"

$ ./tst.sh file
Name: Hedwig Guthrie
country: Vietnam
E-mail: [email protected]
Telephone: 1-749-430-8866
Address: "8087 Eget, Ave"
========================
Name: Mary Taylor
country: Vietnam
E-mail: [email protected]
Telephone: 1-221-754-0377
Address: "146-561 Proin Rd."
========================

Now let's shuffle the order of the input columns and add a couple of additional columns you don't care about:

$ cat file
db_email;garbage;db_address;db_name;more_garbage;db_phone;db_country
[email protected];foo;"8087 Eget, Ave";Hedwig Guthrie;stuff;1-749-430-8866;Vietnam
[email protected];bar;"146-561 Proin Rd.";Mary Taylor;nonsense;1-221-754-0377;Vietnam

$ ./tst.sh file
Name: Hedwig Guthrie
country: Vietnam
E-mail: [email protected]
Telephone: 1-749-430-8866
Address: "8087 Eget, Ave"
========================
Name: Mary Taylor
country: Vietnam
E-mail: [email protected]
Telephone: 1-221-754-0377
Address: "146-561 Proin Rd."
========================

and as you can see the script is agnostic to those input changes.

Just list the input and output tags (names) you care about in the same order in the 2 split() commands to define columns you want output, what order you want them output in, what their output tags should be, and map between the output tags and the input tags.

The above will run orders of magnitude faster than and be more robust and portable than a shell loop, see why-is-using-a-shell-loop-to-process-text-considered-bad-practice.

Answered By - Ed Morton

Answer Checked By - Katrina (WPSolving Volunteer)

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, October 13, 2022

[SOLVED] Bash - Read a csv file and print values by using the header name instead of positional values

Issue

Solution

Popular Posts

Labels