I am looking for a way to replace a HTML tag with another, but keep the text.
I have a big HTML file, which contains:
<span >fork</span>
I want to replace <span> tag with <strong> tag:
<strong>fork</strong>
Tool doesn't really matter, but I am looking for a CLI way to do it.
I am not looking for a HTML processor, because input is a text file with some HTML code in it (not a clean/valid HTML) and I am manually working with the output (copy, modify, use later in its final place). I just want to save some time with the replace.
CodePudding user response:
Consider using Python and a tool like BeautifulSoup to handle HTML. Trying to parse HTML with other tools like sed or awk can lead to terrible places.
As an example:
from bs4 import BeautifulSoup
soup = BeautifulSoup('<li><span >fork</span>')
for spanele in soup.findAll('span'):
spanele.name = 'p'
html_string = str(soup)
print(html_string);
That's lightweight and pretty simple and the html is handled properly with a library that is specifically built to parse it.
CodePudding user response:
Don't use AWK for processing HTML files. If you can turn your HTML file into an XHTML file, you can use xsltproc for an XML transformation as follows:
trans.xsl file:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes" encoding="utf-8"/>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="span[@class='desc e-font-family-cond']">
<strong><xsl:apply-templates/></strong>
</xsl:template>
</xsl:stylesheet>
CLI command for invoking xsltproc, which has to be installed, obviously:
xsltproc trans.xsl file.html
The standard output of this command is the corrected HTML file as you want to have it.
CodePudding user response:
Using sed:
sed 's,<\(\/\)\?span\(\s\)\?,<\1strong\2,g'
$ echo '<span >fork</span>' | sed 's,<\(\/\)\?span\(\s\)\?,<\1strong\2,g'
<strong >fork</strong>
