Thursday, January 11, 2024

[SOLVED] Reading a folder of log files, and calculating the event durations for unique ID's

January 11, 2024 awk, bash, csv, logging, sed

Issue

I have an air gapped system (so limited in software access) that generates usage logs daily. The logs have unique ID's for devices that I've managed to scrape in the past and pump out to a CSV, to which I would then cleanup in LibreCalc (related to this question I asked here - https://superuser.com/questions/1732415/find-next-matching-event-in-log-and-compare-timings) and get event durations for each one.

This is getting arduous as more devices are added so I wish to automate the calculation of the total durations for each device, and how many events occurred for that device. I've had some suggestions of using out/awk/sed and I'm a bit lost on how to implement it.

Log Example
message="device02 connected" event_ts=2023-01-10T09:20:21Z
message="device05 connected" event_ts=2023-01-10T09:21:31Z
message="device02 disconnected" event_ts=2023-01-10T09:21:56Z
message="device04 connected" event_ts=2023-01-10T11:12:28Z
message="device05 disconnected" event_ts=2023-01-10T15:26:36Z
message="device04 disconnected" event_ts=2023-01-10T18:23:32Z

I already have a bash script that scrapes these events from the log files in the folder and then outputs it all to a csv.

#/bin/bash
#Just a datetime stamp for the flatfile
now=$(date +”%Y%m%d”)
#Log file path, also where I define what month to scrape
LOGFILE=’local.log-202301*’
#Shows what log files are getting read
echo $LOGFILE \n
#Output line by line to csv
awk ‘(/connect/ && ORS=”\n”) || (/disconnect/ && ORS=RS) {field1_var=$1” “$2” “$3”,”; print field1_var}’ $LOGFILE > /home/user/logs/LOG_$now.csv

Ideally I'd like to keep that process so I can manually inspect the file if necessary. But ultimately I'd prefer to automate the event calculations to produce something like below:

Desired Output Example
Device         Total Connection Duration     Total Connections
device01       0h 0m 0s                      0
device02       0h 1m 35s                     1
device03       0h 0m 0s                      0
device04       7h 11m 4s                     1
device05       6h 5m 5s                      1

Hopefully thats enough info, any help or pointers would be greatly appreciated. Thanks.

Solution

This isn't based on your script at all, since I didn't get it to produce a CSV, but anyway...

Here's an AWK script that computes the desired result for the given example log file:

function time_lapsed(from, to) {
  gsub(/[^0-9 ]/, " ", from);
  gsub(/[^0-9 ]/, " ", to);
  return mktime(to) - mktime(from);
}
BEGIN { OFS = "\t"; }
(/ connected/) {
  split($1, a, "=\"", _);
  split($3, b, "=", _);
  device_connected_at[a[2]] = b[2];
  device_connection_count[a[2]]++;
}
(/disconnected/) {
  split($1, a, "=\"", _);
  split($3, b, "=", _); 
  device_connection_duration[a[2]]+=time_lapsed(device_connected_at[a[2]], b[2]);
}
END {
  print "Device","Total Connection Duration", "Total Connections";
  for (device in device_connection_duration) {
    print device, strftime("%Hh %Mm %Ss", device_connection_duration[device]), device_connection_count[device];
  };
}

I used it on this example log file

message="device02 connected" event_ts=2023-01-10T09:20:21Z
message="device05 connected" event_ts=2023-01-10T09:21:31Z
message="device02 disconnected" event_ts=2023-01-10T09:21:56Z
message="device04 connected" event_ts=2023-01-10T11:12:28Z
message="device06 connected" event_ts=2023-01-10T11:12:28Z
message="device05 disconnected" event_ts=2023-01-10T15:26:36Z
message="device02 connected" event_ts=2023-01-10T19:20:21Z
message="device04 disconnected" event_ts=2023-01-10T18:23:32Z
message="device02 disconnected" event_ts=2023-01-10T21:41:33Z

And it produces this output

Device  Total Connection Duration   Total Connections
device02    03h 22m 47s 2
device04    08h 11m 04s 1
device05    07h 05m 05s 1

You can pass this program to awk without any flags. It should just work (given you didn't mess around with field and record separators somewhere in your shell session).

Let me explain what's going on: First we define the time_lapsed function. In that function we first convert the ISO8601 timestamps into the format that mktime can handle (YYYY MM DD HH MM SS), we simply drop the offset since it's all UTC. We then compute the difference of the Epoch timestamps that mktime returns and return that result.

Next in the BEGIN block we define the output field separator OFS to be a tab. Then we define two rules, one for log lines when the device connected and one for when the device disconnected. Due to the default field separator the input to these rules looks like this:

$1: message="device02
$2: connected"
$3: event_ts=2023-01-10T09:20:21Z

We don't care about $2. We use split to get the device identifier and the timestamp from $1 and $3 respectively.

In the rule for a device connecting, using the device identifier as the key, we then store when the device connected and increase the connection count for that device. We don't need to initially assign 0 because the associative arrays in awk return "" for fields that contain no record which is coerced to 0 by incrementing it.

In the rule for a device disconnecting we compute the time lapsed and add that to the total time elapsed for that device.

Note that this requires every connect to have a matching disconnect in the logs. I.e., this is very fragile, a missing connect log line will mess up the calculation of the total connection time. A missing disconnect log line with increase the connection count but not the total connection time.

In the END rule we print the desired Output header and for every entry in the associative array device_connection_duration we print the device identifier, total connection duration and total connection count.

I hope this gives you some ideas on how to solve your task.

Answered By - kaikuchn

Answer Checked By - Mildred Charles (WPSolving Admin)

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, January 11, 2024

[SOLVED] Reading a folder of log files, and calculating the event durations for unique ID's

Issue

Solution

Popular Posts

Labels