I would like to use BS4 to remove embedded images to save space, but to leave the tag. For example remove the base64 data but leave <img src="data:image/jpeg;base64,<DELETED>
I can do this to remove everything including the tag:
tags=soup.findAll('img')
for match in tags:
match.decompose()
Removes everything but I would like to keep the tag reference without the actual binary source. Is that possible?
CodePudding user response:
Python3
markup = """
<div>
<p>Take the red pill</p>
<img src="data:image/png;base64, iVBORw0KGgoAAAANSUhEUgAAAAUA
AAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO
9TXL0Y4OHwAAAABJRU5ErkJggg==" alt="Follow the white rabbit" />
</div>
"""
soup = BeautifulSoup(markup, 'html.parser')
tag = soup.img
tag['src'] = "data:image/jpeg;base64,"
print(tag)
Outputs
<img alt="Follow the white rabbit" src="data:image/jpeg;base64,"/>
CodePudding user response:
Here is how I managed to do it. Easy really?
for match in tags:
match['src']='deleted'
