Wednesday, December 29, 2021

[SOLVED] Sort result is Different between command line & text edit(vim)

Issue

I'm newbie in shell scripts (command-line).
Usually, I type single-line commands only, but, today I get different results from command line sort & text edit sort.

Short, I want to know why command-line "sort" is different from vim's ":sort".

Question & my situation details.

I have under sample log(text) file like under.

// log.txt
2021-04-12 10:00:00 [USER1000] login
2021-04-12 10:01:00 [USER1100] login
2021-04-12 10:02:00 [USER1010] login
2021-04-12 10:03:00 [USER1000] logout
2021-04-12 10:04:00 [USER1000] login
2021-04-12 10:05:00 [USER2000] login
2021-04-12 10:06:00 [USER1000] logout
2021-04-12 10:07:00 [USER1100] logout
2021-04-12 10:08:00 [USER1000] login
...

I want to know who and how many times "login" in one day.

So, I use cat, grep, sort, uniq for this.

cat log.txt | grep "login" | grep -o "\[USER....\]" | sort | uniq -c | sort > login.txt

I think it return perfect result, but it's sequence was different.

Under Step is what I expected

  • 1st, cat log.txt print all of them.
2021-04-12 10:00:00 [USER1000] login
2021-04-12 10:01:00 [USER1100] login
2021-04-12 10:02:00 [USER1010] login
2021-04-12 10:03:00 [USER1000] logout
2021-04-12 10:04:00 [USER1000] login
2021-04-12 10:05:00 [USER2000] login
2021-04-12 10:06:00 [USER1000] logout
2021-04-12 10:07:00 [USER1100] logout
2021-04-12 10:08:00 [USER1000] login
  • 2nd, grep "login" filter "login" only
2021-04-12 10:00:00 [USER1000] login
2021-04-12 10:01:00 [USER1100] login
2021-04-12 10:02:00 [USER1010] login
2021-04-12 10:04:00 [USER1000] login
2021-04-12 10:05:00 [USER2000] login
2021-04-12 10:08:00 [USER1000] login
  • 3rd, for group per user, use grep -o.
[USER1000]
[USER1100]
[USER1010]
[USER1000]
[USER2000]
[USER1000]
  • 4th, for use uniq -c, sort all of "login" texts.
[USER1000]
[USER1000]
[USER1000]
[USER1010]
[USER1100]
[USER2000]
  • 5th, uniq -c for group every texts.
3 [USER1000]
1 [USER1010]
1 [USER1100]
1 [USER2000]
  • 6th, sort one more for find-out who login very much.

This step, I append real-case Not Related with upper cases

  1 [USER1000]
 11 [USER1001]
  2 [USER1002]
237 [USER1003]
  4 [USER1005]


It looks like sort by text not number.
so, I change bash command

# cat log.txt | grep "login" | grep -o "\[USER....\]" | sort | uniq -c | sort > login.txt
cat log.txt | grep "login" | grep -o "\[USER....\]" | sort | uniq -c > login.txt
vim login.txt
# in vim, :sort returns result what I want (sort by number)

There's no problem, but I just want to know "why different".
Can I solve this problem with command-line sort too?



I append My Test Code Picture because of comments.

When I wrote this Question, I get right result with vim :sort command.

What I do with VIM

All Test Results are under.

sort -n makes me result what I want.


Solution

Short, I want to know why command-line "sort" is different from vim's ":sort".

The vim :sort command rely on the sort function of a library used by vim. You probably have numerical sort set by default for this one, as you can see with the :help sort feedback :

 The details about sorting depend on the library function used.  There
 is no guarantee that sorting obeys the current locale.  You will have
 to try it out. Vim does do a "stable" sort.
 
 The sorting can be interrupted, but if you interrupt it too late in
 the process you may end up with duplicated lines.  This also depends
 on the system library function used.

You can use the OS sort command instead with :%!sort to retrieve the 'same' sort order than the OS command.

To sort numerically with the OS command, use the -n option :

cat log.txt | grep "login" | grep -o "\[USER....\]" | sort -n | uniq -c | sort -n


Answered By - Zilog80