Issue
I have lots of lists like the following, the first column is an ID number, the second column is a score and the third column is date of birth in DDMMYYYY format.
111 100 01012011
222 90 01012001
333 90 01012013
444 80 01012015
555 80 01012014
666 70 01012016
777 60 01012017
888 50 01012018
When there are multiple lines with the same score, i would like to reorder them with the newest date at the top, the result for the example would be:
111 100 01012011
333 90 01012013
222 90 01012001
555 80 01012014
444 80 01012015
666 70 01012016
777 60 01012017
888 50 01012018
As you can see the lines with same scores have been re-arranged with the newest date at the top.
I have first tried to select the earliest date, i can do this with the following:
sort -k1.5 -k1.1,1.2 -k1.3,1.4 | tail -n 1
But i am unsure of how i can acheive the outcome. How can i acheive the result?
Solution
The issue with the current sort
attempt is that you're trying to parse the 1st field (-k1.
) when what you want is to parse the 3rd field (-k3.
).
Setup, adding a few entries with dates other than DDMM == 0101
:
$ cat raw.dat
111 100 01012011
222 90 01012001
333 90 01012013
444 80 01012015
555 80 01012014
666 70 01012016
777 60 01012017
888 50 01012018
aaa 35 01082022
bbb 35 23012022
ccc 35 12112022
ddd 35 10122022
One approach, assuming the first sort is for the 2nd field (score
) in descending numeric order:
$ sort -t' ' -k2,2nr -k3.5,3.8nr -k3.3,3.4nr -k3.1,3.2nr raw.dat
111 100 01012011
333 90 01012013
222 90 01012001
444 80 01012015
555 80 01012014
666 70 01012016
777 60 01012017
888 50 01012018
ddd 35 10122022
ccc 35 12112022
aaa 35 01082022
bbb 35 23012022
Where:
-t ' '
- define delimiter as space (overrides default which is the non-blank to blank transition, which would cause leading space(s) to be counted as part of the field)-k2,2nr
- 1st sort (score
), start with field2
, end with field2
, sort asn
umeric and inr
everse (aka descending) order-k3.5,3.8nr
- 2nd sort (YYYY
), start with field3
and5
th character, end with field3
and8
th character, sort asn
umeric and inr
everse order (-k3.3,3.4nr
and-k3.1,3.2nr
performr
eversen
umeric sorts onMM
andDD
, respectively)- NOTE: OP's expected output shows
ID=555
(YYYY=2014
) listed beforeID=444
(YYYY=2015
); I'm assuming this is a typo and thatID=444
should be listed first
Pulling Ed Morton's comments up into this answer ...
-b
discards all leading spaces (ie, replaces-t ' '
) but ...- the
-b
option must be applied at each key when applying flags at the key level; alternatively ... - the
-b
option can be applied as a global flag as long as there are no key-level flags (key-level flags take precedence over - and negate - global level flags)
Applying these rules we get one of the following:
# all global-level flags
sort -nrb -k2,2 -k3.5,3.8 -k3.3,3.4 -k3.1,3.2 raw.dat
# all key-level flags ('start' only needs '-b` flag)
sort -k2b,2bnr -k3.5b,3.8bnr -k3.3b,3.4bnr -k3.1b,3.2bnr raw.dat
# all key-level flags (overkill on the 'start' flags)
sort -k2bnr,2bnr -k3.5bnr,3.8bnr -k3.3bnr,3.4bnr -k3.1bnr,3.2bnr raw.dat
These all generate:
111 100 01012011
333 90 01012013
222 90 01012001
444 80 01012015
555 80 01012014
666 70 01012016
777 60 01012017
888 50 01012018
ddd 35 10122022
ccc 35 12112022
aaa 35 01082022
bbb 35 23012022
Answered By - markp-fuso Answer Checked By - Marie Seifert (WPSolving Admin)