Monday, October 10, 2022

[SOLVED] Find latest link from a HTML page listing download locations

Issue

I'm trying to build an equivalent to the following github-specific code that works for finding the latest artifact available for download from href="https://runtime.fivem.net/artifacts/fivem/build_proot_linux/master" rel="nofollow noreferrer">https://runtime.fivem.net/artifacts/fivem/build_proot_linux/master -- the download links look something like https://runtime.fivem.net/artifacts/fivem/build_proot_linux/master/5901-5db768d8bbb973ba27c81e424aea2910144a3100/fx.tar.xz.

# Working code for github.com, needs to be converted to fivem.net
LOCATION=$(curl -s https://api.github.com/repos/someuser/somerepo/releases/latest \
| grep "tag_name" \
| awk '{print "https://github.com/someuser/somerepo/archive/" substr($2, 2, length($2)-3) ".zip"}') \
; curl -L -o file.zip $LOCATION

The file has an incremental version number but not a sequential number, followed by a completely random hash.

How can I find the latest download link from the HTML page at https://runtime.fivem.net/artifacts/fivem/build_proot_linux/master?


Solution

We can build off the use of lynx dump, as suggested in Easiest way to extract the urls from an html page using sed or awk only --

#!/usr/bin/env bash

url_re='https://runtime.fivem.net/artifacts/fivem/build_proot_linux/master/([[:digit:]]+)-([[:xdigit:]]+)/fx.tar.xz'
newest_link_num=0
newest_link_content=
while read -r _ link; do
  [[ $link =~ $url_re ]] || continue
  if (( ${BASH_REMATCH[1]} > newest_link_num )); then
    newest_link_num=${BASH_REMATCH[1]}
    newest_link_content=$link
  fi
done < <(lynx -dump -listonly -hiddenlinks=listonly https://runtime.fivem.net/artifacts/fivem/build_proot_linux/master)

echo "Newest link is: $newest_link_content"

As of this writing, it finishes with the following output:

Newest link is: https://runtime.fivem.net/artifacts/fivem/build_proot_linux/master/5901-5db768d8bbb973ba27c81e424aea2910144a3100/fx.tar.xz



Answered By - Charles Duffy
Answer Checked By - Marilyn (WPSolving Volunteer)