I'm really confused because I don't think those are special characters. In either case I tried prepending them with a backslash. But I have a big text file that's basically html code. And i want to extract text between some tags. I cropped a piece below:
b282yb keod5gw0 nxhoafnm aigsh9s9 d3f4x2em iv3no6db jq4qci2q a3bd9o3v lrazzd5p bwm1u5wc" dir="auto"><span >Text #1</span></a></div><div ></span></span></div> </div><div dir="auto">Text #2</span></a></div> <div ><span aahdfvyu">', f)but it comes back with
['<span >Text #1', '</span></div></div><div dir="auto">Text #2']so it doesn't remove everything before the string. Why?
CodePudding user response:
text="""b282yb keod5gw0 nxhoafnm aigsh9s9 d3f4x2em iv3no6db jq4qci2q a3bd9o3v lrazzd5pbwm1u5wc" dir="auto"><span >Text #1</span></a></div><div ></span></span></div> </div><div dir="auto">Text #2</span></a></div><div ><span "" re.findall(r'>([^<] )</span></a></div><div >',text)result
['Text #1', 'Text #2']
