I have this code in some file
<pre dir="ltr" data-xf-init="code-block" data-lang=""><code>-Fix numcer one/Two
-EMM Support
-Fix update < broken
-Add support patch</code></pre>
</div>
</div><b><br />
I need to remove some characters and keep just this code
-Fix numcer one/Two
-EMM Support
-Fix update < broken
-Add support patch
I have try this code
#!/bin/bash
sed -n '/>-/,/</p' /home/Desktop/1 > /home/Desktop/2
sed -n '/^-*code>/p' /home/raed/Desktop/2 > /home/Desktop/3
sed -i 's#</code></pre>##' /home/Desktop/3
exit
But the code remove first line
-Fix numcer one/Two
CodePudding user response:
1st solution: Try GNU awk for this one. With your shown samples please try following awk code.
awk -v RS="^$" '
match($0,/(^|\n)<pre ]*".*<code>-(.*)<\/code>/,arr){
print arr[2]
}
' Input_file
Explanation: Simple explanation would be, using GNU awk's capability to make RS ^$ and then using its match function to match regex (^|\n)<pre ]*".*<code>-(.*)<\/code>(explained later in this answer). This regex creates 2 capturing groups and all matched values are getting stored into array named arr. So if regex has matched values then I am simply printing 2nd element of array arr by using arr[2] to get desired values.
2nd solution: With sed using -z and -E options please try following code.
sed -zE 's/(^|\n)<pre ]*".*<code>-(.*)<\/code>.*/\2/' Input_file
OR if your sed version supports \n then with a slight change in above sed code you can have as follows:
sed -zE 's/(^|\n)<pre ]*".*<code>-(.*)<\/code>.*/\2\n/' Input_file
3rd solution: With GNU grep please try following code:
grep -zoP '(^|\n)<pre ]*".*?<code>-\K(.*?\n[^\n] ) (?=</code>)' Input_file
CodePudding user response:
Try this
sed 's/<[^>]*>//g' <file
It will remove everything between < and the next > (linewise).
