Issue
I have an air gapped system (so limited in software access) that generates usage logs daily. The logs have unique ID's for devices that I've managed to scrape in the past and pump out to a CSV, to which I would then cleanup in LibreCalc (related to this question I asked here - https://superuser.com/questions/1732415/find-next-matching-event-in-log-and-compare-timings) and get event durations for each one.
This is getting arduous as more devices are added so I wish to automate the calculation of the total durations for each device, and how many events occurred for that device. I've had some suggestions of using out/awk/sed and I'm a bit lost on how to implement it.
Log Example
message="device02 connected" event_ts=2023-01-10T09:20:21Z
message="device05 connected" event_ts=2023-01-10T09:21:31Z
message="device02 disconnected" event_ts=2023-01-10T09:21:56Z
message="device04 connected" event_ts=2023-01-10T11:12:28Z
message="device05 disconnected" event_ts=2023-01-10T15:26:36Z
message="device04 disconnected" event_ts=2023-01-10T18:23:32Z
I already have a bash script that scrapes these events from the log files in the folder and then outputs it all to a csv.
#/bin/bash
#Just a datetime stamp for the flatfile
now=$(date +”%Y%m%d”)
#Log file path, also where I define what month to scrape
LOGFILE=’local.log-202301*’
#Shows what log files are getting read
echo $LOGFILE \n
#Output line by line to csv
awk ‘(/connect/ && ORS=”\n”) || (/disconnect/ && ORS=RS) {field1_var=$1” “$2” “$3”,”; print field1_var}’ $LOGFILE > /home/user/logs/LOG_$now.csv
Ideally I'd like to keep that process so I can manually inspect the file if necessary. But ultimately I'd prefer to automate the event calculations to produce something like below:
Desired Output Example
Device Total Connection Duration Total Connections
device01 0h 0m 0s 0
device02 0h 1m 35s 1
device03 0h 0m 0s 0
device04 7h 11m 4s 1
device05 6h 5m 5s 1
Hopefully thats enough info, any help or pointers would be greatly appreciated. Thanks.
Solution
This isn't based on your script at all, since I didn't get it to produce a CSV, but anyway...
Here's an AWK script that computes the desired result for the given example log file:
function time_lapsed(from, to) {
gsub(/[^0-9 ]/, " ", from);
gsub(/[^0-9 ]/, " ", to);
return mktime(to) - mktime(from);
}
BEGIN { OFS = "\t"; }
(/ connected/) {
split($1, a, "=\"", _);
split($3, b, "=", _);
device_connected_at[a[2]] = b[2];
device_connection_count[a[2]]++;
}
(/disconnected/) {
split($1, a, "=\"", _);
split($3, b, "=", _);
device_connection_duration[a[2]]+=time_lapsed(device_connected_at[a[2]], b[2]);
}
END {
print "Device","Total Connection Duration", "Total Connections";
for (device in device_connection_duration) {
print device, strftime("%Hh %Mm %Ss", device_connection_duration[device]), device_connection_count[device];
};
}
I used it on this example log file
message="device02 connected" event_ts=2023-01-10T09:20:21Z
message="device05 connected" event_ts=2023-01-10T09:21:31Z
message="device02 disconnected" event_ts=2023-01-10T09:21:56Z
message="device04 connected" event_ts=2023-01-10T11:12:28Z
message="device06 connected" event_ts=2023-01-10T11:12:28Z
message="device05 disconnected" event_ts=2023-01-10T15:26:36Z
message="device02 connected" event_ts=2023-01-10T19:20:21Z
message="device04 disconnected" event_ts=2023-01-10T18:23:32Z
message="device02 disconnected" event_ts=2023-01-10T21:41:33Z
And it produces this output
Device Total Connection Duration Total Connections
device02 03h 22m 47s 2
device04 08h 11m 04s 1
device05 07h 05m 05s 1
You can pass this program to awk without any flags. It should just work (given you didn't mess around with field and record separators somewhere in your shell session).
Let me explain what's going on:
First we define the time_lapsed
function. In that function we first convert the ISO8601 timestamps into the format that mktime
can handle (YYYY MM DD HH MM SS
), we simply drop the offset since it's all UTC. We then compute the difference of the Epoch timestamps that mktime
returns and return that result.
Next in the BEGIN
block we define the output field separator OFS
to be a tab.
Then we define two rules, one for log lines when the device connected and one for when the device disconnected.
Due to the default field separator the input to these rules looks like this:
$1: message="device02
$2: connected"
$3: event_ts=2023-01-10T09:20:21Z
We don't care about $2
. We use split
to get the device identifier and the timestamp from $1
and $3
respectively.
In the rule for a device connecting, using the device identifier as the key, we then store when the device connected and increase the connection count for that device. We don't need to initially assign 0
because the associative arrays in awk return ""
for fields that contain no record which is coerced to 0
by incrementing it.
In the rule for a device disconnecting we compute the time lapsed and add that to the total time elapsed for that device.
Note that this requires every connect to have a matching disconnect in the logs. I.e., this is very fragile, a missing connect log line will mess up the calculation of the total connection time. A missing disconnect log line with increase the connection count but not the total connection time.
In the END
rule we print the desired Output header and for every entry in the associative array device_connection_duration
we print the device identifier, total connection duration and total connection count.
I hope this gives you some ideas on how to solve your task.
Answered By - kaikuchn Answer Checked By - Mildred Charles (WPSolving Admin)