Issue
I have a time series data wherein measurement values from different sensors have been captured asynchronously and concatenated into same ascii file. The sensor values have been captured at the same time instance.
The values are white space separated.
Original file looks like below.
2022 281 08 48 14 876 10 1.00 NOTSAMPLED NOTSAMPLED
2022 281 08 48 14 876 10 NOTSAMPLED 0.00 NOTSAMPLED
2022 281 08 48 14 876 10 NOTSAMPLED NOTSAMPLED 1.00
2022 281 08 48 15 391 11 1.00 NOTSAMPLED NOTSAMPLED
2022 281 08 48 15 391 11 NOTSAMPLED 0.00 NOTSAMPLED
2022 281 08 48 15 391 11 NOTSAMPLED NOTSAMPLED 1.00
2022 281 08 48 15 896 12 1.00 NOTSAMPLED NOTSAMPLED
2022 281 08 48 15 896 12 NOTSAMPLED 0.00 NOTSAMPLED
2022 281 08 48 15 896 12 NOTSAMPLED NOTSAMPLED 1.00
Now I need to replace the string NOTSAMPLED
with the previous instance sensor value as mentioned below and also merge sensor values spread across multiple rows and columns into single row having same time.
2022 281 08 48 14 876 10 1.00 0.0 1.0
2022 281 08 48 15 391 11 1.00 0.0 1.0
2022 281 08 48 15 896 12 1.00 0.0 1.0
Similarly if input data is
2022 281 08 48 14 876 10 1.00 NOTSAMPLED NOTSAMPLED
2022 281 08 48 14 876 10 NOTSAMPLED 0.00 NOTSAMPLED
2022 281 08 48 14 880 10 NOTSAMPLED NOTSAMPLED 10.00
2022 281 08 48 15 391 11 1.00 NOTSAMPLED NOTSAMPLED
2022 281 08 48 15 391 11 NOTSAMPLED 0.00 NOTSAMPLED
2022 281 08 48 15 395 11 NOTSAMPLED NOTSAMPLED 11.00
2022 281 08 48 15 896 12 1.00 NOTSAMPLED NOTSAMPLED
2022 281 08 48 15 896 12 NOTSAMPLED 0.00 NOTSAMPLED
2022 281 08 48 15 900 12 NOTSAMPLED NOTSAMPLED 12.00
then my expected output is
2022 281 08 48 14 876 10 1.00 0.00 NOTSAMPLED
2022 281 08 48 14 880 10 1.00 0.00 10.00
2022 281 08 48 15 391 11 1.00 0.00 10.00
2022 281 08 48 15 395 11 1.00 0.00 11.00
2022 281 08 48 15 896 12 1.00 0.00 11.00
2022 281 08 48 15 900 12 1.00 0.00 12.00
How can it be achieved using sed/awk or any other bash shell scripting commands ?
I tried the following script.
#! /bin/bash
inp_filename=$1
awk '
NR == 1 { split($0, filldown) }
{
for (i = 6; i <= NF; i++)
if ($i != "NOTSAMPLED")
filldown[i] = $i
else
$i = filldown[i]
print
}
' $inp_filename`
But the result is
2022 281 08 48 14 876 10 1.00 NOTSAMPLED NOTSAMPLED
2022 281 08 48 14 876 10 1.00 0.00 NOTSAMPLED
2022 281 08 48 14 876 10 1.00 0.00 1.00
2022 281 08 48 15 391 11 1.00 0.00 NOTSAMPLED
2022 281 08 48 15 391 11 1.00 0.00 NOTSAMPLED
2022 281 08 48 15 391 11 1.00 0.00 1.00
2022 281 08 48 15 896 12 1.00 0.00 NOTSAMPLED
2022 281 08 48 15 896 12 1.00 0.00 NOTSAMPLED
2022 281 08 48 15 896 12 1.00 0.00 1.00
Solution
One awk
idea:
awk '
function print_values( j) {
if (FNR>1) {
printf "%s", ts_prev # print previous date/time stamp
for (j=8;j<=NF;j++) # loop through sensor values and append to current line of output
printf "%s%s", OFS, (values[j]=="" ? "NOTSAMPLED" : values[j])
print "" # terminate current line of output
}
ts_prev = ts_curr
}
{ gsub(/\r/,"") # per comment from OP, need to remove windows/dow line endings
ts_curr=$1 # save date/time stamp of current line
for (i=2;i<=7;i++)
ts_curr = ts_curr FS $i
if (ts_curr != ts_prev) # if this is a new date/time stamp then ...
print_values() # print previous date/time stamp and associated sensor values
for (i=8;i<=NF;i++) # loop through values and ...
if ($i != "NOTSAMPLED") # if a valid value then ...
values[i]=$i # save the value
}
END { print_values() } # flush last date/time stamp to stdout
' sensor.dat
This generates:
2022 281 08 48 14 876 10 1.00 0.00 NOTSAMPLED
2022 281 08 48 14 880 10 1.00 0.00 10.00
2022 281 08 48 15 391 11 1.00 0.00 10.00
2022 281 08 48 15 395 11 1.00 0.00 11.00
2022 281 08 48 15 896 12 1.00 0.00 11.00
2022 281 08 48 15 900 12 1.00 0.00 12.00
Answered By - markp-fuso Answer Checked By - Candace Johnson (WPSolving Volunteer)