Assume I have an HTML file like this:
<body>
<div id="a">
content of div a
<div id="b"> content of div b </div>
<div id="c"> content of div c </div>
</div>
<style>
#a {color: red; }
#b {color: green; }
#c {color: blue; }
</style>
</body>
I want to append a unique suffix (say, -suffix) to all ids, which would include attributes id="..." and selectors #..., and result in a file like this:
<body>
<div id="a-suffix">
content of div a
<div id="b-suffix"> content of div b </div>
<div id="c-suffix"> content of div c </div>
</div>
<style>
#a-suffix {color: red; }
#b-suffix {color: green; }
#c-suffix {color: blue; }
</style>
</body>
How do I accomplish this with standard unix shell tools like sed, grep, awk in a way that would cover as many situations as possible?
My attempt:
I came up with the following sed command:
sed -e 's/id="\([-_a-zA-Z0-9]*\)"/id="\1-suffix"/g;s/#\([-_a-zA-Z0-9]*\)/#\1-suffix/g' index.html
Which is actually two commands in one:
s/id="\([-_a-zA-Z0-9]*\)"/id="\1-suffix"/g- substitutes id attributesid="..."s/#\(\[-_a-zA-Z0-9]*\)/#\1-suffix/g- substitutes id selectors#...
However it's far from perfect. First, it only supports double attribute values in double quotes id="..." and id values are limited in that they have to match [-_a-zA-Z0-9]*. Second, this clashes with hex colors, so a white color like #ffffff would get a suffix #ffffff-suffix; An id selector like #... should only get a suffix if an appropriate attribute id="..." exists.
What is the best way to accomplish this?
CodePudding user response:
There are a lot of cases in your file, as you mentionned with the colour problem My approach would be to treat the file line by line using
cat inputfile.html | while read a; do
some code
echo "$a" >> outputfile.html
done
This being said, you may use
b=$(expr "$a" : "regex")
To precisely filter what you want to modify and only then, use some
sed
on $b to get what you want and push $b into $a
