Issue
I'm trying to work out a linux one-liner to list only files that are earlier version duplicates of files in a directory. e.g.:
filenames:
foo-bar-foo-1.3.42.jar
foo-bar-foo-1.2.21.jar
foo-2.3-foo-bar-3.1.2.jar
foo-2.3-foo-bar-3.2.4.jar
bar-foo-1.24.jar
bar-foo-2.0.jar
foobar-foobar-3.4.1.jar
barfoo-barfoo-1.2.1.jar
expected output:
foo-bar-foo-1.2.21.jar
foo-2.3-bar-3.1.2.jar
bar-foo-1.24.jar
This is similar to the question https://unix.stackexchange.com/questions/185193/remove-the-low-version-number-of-file , but that one relies on being able to set a file seperator on the first dash and mine have at least two dashes. I've had some limited success trying to tweak it like this:
ls -vr *.jar | awk -F-[0-9]+.[0-9]+.[0-9]+ '$1 == name{system ("ls \""$0"\"")}{name=$1}'
but it misses those with only 2 numbers in the version.
and using this gets caught up on files like foo-2.3-foo-bar-3.1.2.jar:
ls -vr *.jar | awk -F-[0-9]+.[0-9]+ '$1 == name{system ("ls \""$0"\"")}{name=$1}'
I can also use gsub to get a variable that contains everything but the version number, but I can't figure out how to use it to ultimately get my expected results.
ls -vr *.jar | awk -F- '{gsub("-"$NF,"",$x)}{print $x}'
I am open to not using awk if there's a better solution (I'm not terribly familiar with it). I'm working on RHEL in bash with sed also available. However, it must be a one-liner that can be used directly on the command line.
Solution
Sort now has a --version-sort
option which is the hero here.
#!/bin/bash
# let awk remember the previous file prefix (p1) and previous file name (f1)
# if the current prefix (p2) matches the previous prefix (p1), then
# print the previous filename (f1)
awk '{
# remember the previous values
p1=p2
f1=f2
# save the current filename
f2=$0
# strip the version and extension
sub(/[0-9\.]+.[a-z]+$/, "")
# save as the current prefix
p2=$0
if (p1 == p2) {
# print the previous filename if this prefix is the same as the previous
print f1
}
}' <(sort --version-sort <(for f in *.jar; do echo "$f"; done))
And now for the one-liner :)
awk '{p1=p2; f1=f2; f2=$0; sub(/[0-9\.]+.[a-z]+$/, ""); p2=$0; if (p1==p2) {print f1}}' <(sort -V <(for f in *.jar; do echo "$f"; done))
Results:
bar-foo-1.24.jar
foo-2.3-foo-bar-3.1.2.jar
foo-bar-foo-1.2.21.jar
Answered By - Cole Tierney