Home > Blockchain >  AWK to replace HTML tag with another and keep text
AWK to replace HTML tag with another and keep text

Time:01-21

I am looking for a way to replace a HTML tag with another, but keep the text.

I have a big HTML file, which contains:

<span >fork</span>

I want to replace <span> tag with <strong> tag:

<strong>fork</strong>

Tool doesn't really matter, but I am looking for a CLI way to do it.

I am not looking for a HTML processor, because input is a text file with some HTML code in it (not a clean/valid HTML) and I am manually working with the output (copy, modify, use later in its final place). I just want to save some time with the replace.

CodePudding user response:

Consider using Python and a tool like BeautifulSoup to handle HTML. Trying to parse HTML with other tools like sed or awk can lead to terrible places.

As an example:

from bs4 import BeautifulSoup
soup = BeautifulSoup('<li><span >fork</span>')
for spanele in soup.findAll('span'):
    spanele.name = 'p'
html_string = str(soup)
print(html_string);

That's lightweight and pretty simple and the html is handled properly with a library that is specifically built to parse it.

CodePudding user response:

Don't use AWK for processing HTML files. If you can turn your HTML file into an XHTML file, you can use xsltproc for an XML transformation as follows:

trans.xsl file:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet
  version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:output method="xml" indent="yes" encoding="utf-8"/>

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="span[@class='desc e-font-family-cond']">
    <strong><xsl:apply-templates/></strong>
  </xsl:template>

</xsl:stylesheet>

CLI command for invoking xsltproc, which has to be installed, obviously:

xsltproc trans.xsl file.html

The standard output of this command is the corrected HTML file as you want to have it.

CodePudding user response:

Using sed:

sed 's,<\(\/\)\?span\(\s\)\?,<\1strong\2,g'

$ echo '<span >fork</span>' | sed 's,<\(\/\)\?span\(\s\)\?,<\1strong\2,g'
<strong >fork</strong>
  •  Tags:  
  • Related