Issue
Need some scripting help. Here is the problem. I have a set of python packages (files)
beautifulsoup4-4.12.2-py3-none-any.whl
certifi-2023.7.22-py3-none-any.whl
charset_normalizer-3.3.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
google-3.0.0-py2.py3-none-any.whl
idna-3.4-py3-none-any.whl
protobuf3-0.2.1.tar.gz
protobuf-3.19.6-py2.py3-none-any.whl
proton-0.9.1.tar.gz
python-qpid-proton-0.38.0.tar.gz
redis-4.5.5-py3-none-any.whl
requests-2.31.0-py3-none-any.whl
robotframework-6.1.1-py3-none-any.whl
robotframework_requests-0.9.1-py3-none-any.whl
robotframework-run-keyword-async-1.0.8.tar.gz
soupsieve-2.5-py3-none-any.whl
urllib3-2.0.7-py3-none-any.whl
I need to parse each file name to get the package name and its version. I have some working while others fail. Basically the list above should find this for the name and version of each.
beautifulsoup4 4.12.2
certifi 2023.7.22
charset-normalizer 3.3.1
google 3.0.0
idna 3.4
protobuf3 0.2.1
protobuf 3.19.6
proton 0.9.1
python-qpid-proton 0.38.0
redis 4.5.5
requests 2.31.0
robotframework 6.1.1
robotframework-requests 0.9.1
robotframework-run-keyword-async 1.0.8
soupsieve 2.5
urllib3 2.0.7
I've tried cut, grep, sed, and awk to get this working but the numbers appearing in names, multi-digit versions, inconsistency of pattern, cause one or the other methods to fail. You'll also notice that charset and robotframework-requests change _ to a - but those cases I'm hoping are less frequent and I can work around that when it happens.
I'm stuck on how to make this work. Any ideas. Here is my current script logic (fullName is the file name listed above) but if fails for certifi, charset-normalizer, idna, soupsieve, and robotframework-requests.
version=`echo "$fullName" | sed -nre 's/^[^0-9]*(([0-9]+\.)*[0-9]+).*/\1/p'`
artifactId=`echo "$fullName" | sed -r "s/-${version}.*//g"`
Specifically for the current script for the ones that don't work build the artifact and version as:
beautifulsoup4 4
protobuf3-0.2.1.tar.gz 3
urllib3-2.0.7-py3-none-any.whl 3
If anyone has a good way to parse the artifactId/version with regex or any other bash scripting method I'm open to try anything.
Thanks
Solution
Using sed
you could use 2 capture groups and capture and number format after the last occurrence of -
followed by either -
or .
If there has to be at least a single dot in the version, you can change (\.[0-9]+)*
to (\.[0-9]+)+
sed -E 's/^(.*)-([0-9]+(\.[0-9]+)*)[.-].*/\1 \2/' file | column -t
Output
beautifulsoup4 4.12.2
certifi 2023.7.22
charset_normalizer 3.3.1
google 3.0.0
idna 3.4
protobuf3 0.2.1
protobuf 3.19.6
proton 0.9.1
python-qpid-proton 0.38.0
redis 4.5.5
requests 2.31.0
robotframework 6.1.1
robotframework_requests 0.9.1
robotframework-run-keyword-async 1.0.8
soupsieve 2.5
urllib3 2.0.7
Answered By - The fourth bird Answer Checked By - Timothy Miller (WPSolving Admin)