Issue
I'm using this script to read a csv and print specific values which works just fine. However I'm looking for a way that when my input file changes, (e.g. columns are added at random places or 2 existing columns are switched from position), my script would still work. A solution would be if I could use the header column names, not in a positional way as I'm doing now. However I can't find exactly how to do that. Anyone got an idea?
Input file:
db_name;db_country;db_email;db_phone;db_address
Hedwig Guthrie;Vietnam;[email protected];1-749-430-8866;"8087 Eget, Ave"
Mary Taylor;Vietnam;[email protected];1-221-754-0377;"146-561 Proin Rd."
#!/bin/bash
exec < input.csv
IFS=';'
read header
while read name country email phone address
do
echo "Name: " ${name}
echo "country: " ${country}
echo "E-mail: " ${email}
echo "Telephone: " ${phone}
echo "Address: " ${address}
echo "========================"
done
Output:
Name: Hedwig Guthrie
country: Vietnam
E-mail: [email protected]
Telephone: 1-749-430-8866
Address: "8087 Eget, Ave"
========================
Name: Mary Taylor
country: Vietnam
E-mail: [email protected]
Telephone: 1-221-754-0377
Address: "146-561 Proin Rd."
========================
Solution
Using any awk and with lots of intermediate variables and meaningful variable names to hopefully make the code easy to understand (I'm using the term "tag" to mean the names of the input columns/output rows):
$ cat tst.sh
#!/usr/bin/env bash
awk -F ';' -v OFS=': ' '
BEGIN {
numOutTags = split("Name;country;E-mail;Telephone;Address",outNrs2outTags)
split("db_name;db_country;db_email;db_phone;db_address",outNrs2inTags)
}
NR==1 {
for ( inTagNr=1; inTagNr<=NF; inTagNr++ ) {
inTag = $inTagNr
inTags2inNrs[inTag] = inTagNr
}
next
}
{
for ( outTagNr=1; outTagNr<=numOutTags; outTagNr++ ) {
outTag = outNrs2outTags[outTagNr]
inTag = outNrs2inTags[outTagNr]
inTagNr = inTags2inNrs[inTag]
print outTag, $inTagNr
}
print "========================"
}
' "${@:--}"
$ ./tst.sh file
Name: Hedwig Guthrie
country: Vietnam
E-mail: [email protected]
Telephone: 1-749-430-8866
Address: "8087 Eget, Ave"
========================
Name: Mary Taylor
country: Vietnam
E-mail: [email protected]
Telephone: 1-221-754-0377
Address: "146-561 Proin Rd."
========================
Now let's shuffle the order of the input columns and add a couple of additional columns you don't care about:
$ cat file
db_email;garbage;db_address;db_name;more_garbage;db_phone;db_country
[email protected];foo;"8087 Eget, Ave";Hedwig Guthrie;stuff;1-749-430-8866;Vietnam
[email protected];bar;"146-561 Proin Rd.";Mary Taylor;nonsense;1-221-754-0377;Vietnam
$ ./tst.sh file
Name: Hedwig Guthrie
country: Vietnam
E-mail: [email protected]
Telephone: 1-749-430-8866
Address: "8087 Eget, Ave"
========================
Name: Mary Taylor
country: Vietnam
E-mail: [email protected]
Telephone: 1-221-754-0377
Address: "146-561 Proin Rd."
========================
and as you can see the script is agnostic to those input changes.
Just list the input and output tags (names) you care about in the same order in the 2 split()
commands to define columns you want output, what order you want them output in, what their output tags should be, and map between the output tags and the input tags.
The above will run orders of magnitude faster than and be more robust and portable than a shell loop, see why-is-using-a-shell-loop-to-process-text-considered-bad-practice.
Answered By - Ed Morton Answer Checked By - Katrina (WPSolving Volunteer)