Tuesday, July 26, 2022

[SOLVED] How do I do a regex only the specific selection between two tags?

Issue

There have been dozens of similar questions that was asked but my question is about a specific selection between the tags. I don't want the entire selection from <a href to </a>, I only need to target the "> between those tags itself.

I am trying to convert a href links into wikilinks. For example, if the sample text has:

<a href="./light.html">Light</a> is light.

<div class="reasons">

I wanted to edit the file itself and change from <a href="link.html">Link</a> into [[link.html|Link]]. The basic idea that I have right now uses 3 sed edits as follows:

  1. <a href="link.html">Link</a> -> <a href="link.html|Link</a>
  2. <a href="link.html|Link</a> -> [[link.html|Link</a>
  3. [[link.html|Link</a> -> [[link.html|Link]]

My problem lies with the first step; I can't find the regex that only targets "> between <a href and </a>.

I understand that the basic idea would need to be the search target between lookaround and lookbehind. But trying it on regexr showed a fail. I also tried using conditional regex. I can't find the syntax I used but it either turned an error or it worked but it also captured the div class.

Edit: I'm on Ubuntu and using a bash script using sed to do the text manipulation.


Solution

The basic idea that I have right now uses 3 sed edits

Assuming you've also read the answers underneath those dozens of similar questions, you could've known that it's a bad idea to parse HTML with sed (regex).

With an HTML-parser like this would be as simple as:

$ xidel -s '<a href="link.html">Link</a>' -e 'concat("[[",//a/@href,"|",//a,"]]")'
$ xidel -s '<a href="link.html">Link</a>' -e '"[["||//a/@href||"|"||//a||"]]"'
$ xidel -s '<a href="link.html">Link</a>' -e 'x"[[{//a/@href}|{//a}]]"'
[[link.html|Link]]

Three different queries to concatenate strings. The 1st query uses the XPath concat() function, the 2nd query uses the XPath || operator and the 3rd uses xidel's extended string syntax.



Answered By - Reino
Answer Checked By - Robin (WPSolving Admin)