Friday, July 29, 2022

[SOLVED] How can i reorder a list by the earliest date in the second column when the first column is the same?

Issue

I have lots of lists like the following, the first column is an ID number, the second column is a score and the third column is date of birth in DDMMYYYY format.

111 100 01012011
222 90 01012001
333 90 01012013
444 80 01012015
555 80 01012014
666 70 01012016
777 60 01012017
888 50 01012018

When there are multiple lines with the same score, i would like to reorder them with the newest date at the top, the result for the example would be:

111 100 01012011
333 90 01012013
222 90 01012001
555 80 01012014
444 80 01012015
666 70 01012016
777 60 01012017
888 50 01012018

As you can see the lines with same scores have been re-arranged with the newest date at the top.

I have first tried to select the earliest date, i can do this with the following:

 sort -k1.5 -k1.1,1.2 -k1.3,1.4 | tail -n 1

But i am unsure of how i can acheive the outcome. How can i acheive the result?


Solution

The issue with the current sort attempt is that you're trying to parse the 1st field (-k1.) when what you want is to parse the 3rd field (-k3.).

Setup, adding a few entries with dates other than DDMM == 0101:

$ cat raw.dat
111 100 01012011
222 90 01012001
333 90 01012013
444 80 01012015
555 80 01012014
666 70 01012016
777 60 01012017
888 50 01012018
aaa 35 01082022
bbb 35 23012022
ccc 35 12112022
ddd 35 10122022

One approach, assuming the first sort is for the 2nd field (score) in descending numeric order:

$ sort -t' ' -k2,2nr -k3.5,3.8nr -k3.3,3.4nr -k3.1,3.2nr raw.dat
111 100 01012011
333 90 01012013
222 90 01012001
444 80 01012015
555 80 01012014
666 70 01012016
777 60 01012017
888 50 01012018
ddd 35 10122022
ccc 35 12112022
aaa 35 01082022
bbb 35 23012022

Where:

  • -t ' ' - define delimiter as space (overrides default which is the non-blank to blank transition, which would cause leading space(s) to be counted as part of the field)
  • -k2,2nr - 1st sort (score), start with field 2, end with field 2, sort as numeric and in reverse (aka descending) order
  • -k3.5,3.8nr - 2nd sort (YYYY), start with field 3 and 5th character, end with field 3 and 8th character, sort as numeric and in reverse order (-k3.3,3.4nr and -k3.1,3.2nr perform reverse numeric sorts on MM and DD, respectively)
  • NOTE: OP's expected output shows ID=555 (YYYY=2014) listed before ID=444 (YYYY=2015); I'm assuming this is a typo and that ID=444 should be listed first

Pulling Ed Morton's comments up into this answer ...

  • -b discards all leading spaces (ie, replaces -t ' ') but ...
  • the -b option must be applied at each key when applying flags at the key level; alternatively ...
  • the -b option can be applied as a global flag as long as there are no key-level flags (key-level flags take precedence over - and negate - global level flags)

Applying these rules we get one of the following:

# all global-level flags

sort -nrb -k2,2 -k3.5,3.8 -k3.3,3.4 -k3.1,3.2 raw.dat

# all key-level flags ('start' only needs '-b` flag)

sort -k2b,2bnr -k3.5b,3.8bnr -k3.3b,3.4bnr -k3.1b,3.2bnr raw.dat

# all key-level flags (overkill on the 'start' flags)

sort -k2bnr,2bnr -k3.5bnr,3.8bnr -k3.3bnr,3.4bnr -k3.1bnr,3.2bnr raw.dat

These all generate:

111 100 01012011
333 90 01012013
222 90 01012001
444 80 01012015
555 80 01012014
666 70 01012016
777 60 01012017
888 50 01012018
ddd 35 10122022
ccc 35 12112022
aaa 35 01082022
bbb 35 23012022


Answered By - markp-fuso
Answer Checked By - Marie Seifert (WPSolving Admin)