Issue
I have a tab delimited text file called "data.txt" which looks like
data.txt
col2 col3 col4 col1
val1 val5 val9 val13
val2 val6 val10 val14
val3 val7 val11 val15
val4 val8 val12 val16
...
and I have an array col_order = [col1, col2, col3, col4]
The Objective is to rearrange the columns in "data.txt" according to the order in array "col_order" using a shell script.
Final Output
col1 col2 col3 col4
val13 val1 val5 val9
val14 val2 val6 val10
val15 val3 val7 val11
val16 val4 val8 val12
My Progress So Far
awk 'BEGIN{ORS=RS="\n"; S=OFS="\t"}{for (i=1; i<=NF; i++) {f[$i] = i}{ print $(f["col1"]),$(f["col2"]),$(f["col3"]),$(f["col4"])}}' data.txt> data_corrected.txt
The above statement works as expected, but the order is hard coded in the statement, Couldn't figure out how to take the order from array and pass it in the statement.
I'm open to any other approach as well.
Solution
Could you please try following.
cat script.bash
List=( col1 col2 col3 col4 )
##echo ${List[*]}
awk -v bash_arr_val="${List[*]}" '
BEGIN{
num=split(bash_arr_val,array," ")
for(i=1;i<=num;i++){
array_with_bash_values_as_index[array[i]]=i
}
}
FNR==1{
for(i=1;i<=NF;i++){
if($i in array_with_bash_values_as_index){
actual_array[array_with_bash_values_as_index[$i]]=i
}
}
}
{
for(i=1;i<=num;i++){
printf("%s%s",$actual_array[i],i==NF?ORS:OFS)
}
}
' Input_file
Output will be as follows.
col1 col2 col3 col4
val13 val1 val5 val9
val14 val2 val6 val10
val15 val3 val7 val11
val16 val4 val8 val12
EDIT by Ed Morton for variable name suggestions:
$ cat tst.awk
BEGIN{
numOutFlds = split(bash_arr_val,outNr2name)
for ( outNr=1; outNr<=numOutFlds; outNr++ ) {
fldName = outNr2name[outNr]
name2outNr[fldName] = outNr
}
}
FNR==1 {
for ( inNr=1; inNr<=NF; inNr++ ) {
fldName = $inNr
outNr = name2outNr[fldName]
outNr2inNr[outNr] = inNr
}
}
{
for ( outNr=1; outNr<=numOutFlds; outNr++ ) {
inNr = outNr2inNr[outNr]
fldValue = $inNr
printf "%s%s", fldValue, (outNr<numOutFlds ? OFS : ORS)
}
}
$ awk -v bash_arr_val='col1 col2 col3 col4' -f tst.awk file
col1 col2 col3 col4
val13 val1 val5 val9
val14 val2 val6 val10
val15 val3 val7 val11
val16 val4 val8 val12
and here's how I'd really write it if I was just creating a script for myself rather than using so many temp variables, etc. to make it as clear as possible for others to understand:
$ cat tst.awk
BEGIN{
numOutFlds = split(bash_arr_val,outNr2name)
for ( outNr=1; outNr<=numOutFlds; outNr++ ) {
name2outNr[outNr2name[outNr]] = outNr
}
}
FNR==1 {
for ( inNr=1; inNr<=NF; inNr++ ) {
f[name2outNr[$inNr]] = inNr
}
}
{
for ( outNr=1; outNr<=numOutFlds; outNr++ ) {
printf "%s%s", $(f[outNr]), (outNr<numOutFlds ? OFS : ORS)
}
}
$ awk -v bash_arr_val='col1 col2 col3 col4' -f tst.awk file
col1 col2 col3 col4
val13 val1 val5 val9
val14 val2 val6 val10
val15 val3 val7 val11
val16 val4 val8 val12
Answered By - RavinderSingh13 Answer Checked By - Mary Flores (WPSolving Volunteer)