Home > Blockchain >  Replacing a bs4 element with a string
Replacing a bs4 element with a string

Time:01-27

So I have a HTML document, where I want to add HTML anchor link tags so that I can easily go to a certain part of a webpage.

The first step is to find all divs that need to replaced. Secondly, an anchor link tag needs to be added, based on the text that is within the div. My code looks as follows:

from bs4 import BeautifulSoup
path= "/text.html"

with open(path) as fp:
    soup = BeautifulSoup(fp, 'html.parser')

mydivs = soup.find_all("p", {"class": "tussenkop"})
    
for div in mydivs:
    if "Artikel" in div.getText():
        string = div.getText().split()[1]
        div_id = f"""<a id="{string}"></a>{div}"""
        full =f"{div_id}{div}"
        html_soup = BeautifulSoup(full, 'html.parser')
        div = html_soup

A div looks as follows:

<p ><strong >Artikel 7.37 text text text</strong></p>

After adding the anchor tag it becomes:

<a id="7.37"></a><p ><strong >Artikel 10.6 Inwerkingtreding</strong></p><p ><strong >Artikel 7.37 text text text</strong></p>

But the problem is, div is not replaced by the new div. How should I correct this? Or is there another way to insert an anchor tag?

CodePudding user response:

I'm not quite sure what your expected output to look like, but BeautifulSoup has methods to create new tags and attributes, and insert them into the soup object.

from bs4 import BeautifulSoup

fp = '<p ><strong >Artikel 7.37 text text text</strong>'

soup = BeautifulSoup(fp, 'html.parser')
print('soup before: ', soup)

mydivs = soup.find_all("p", {"class": "tussenkop"})
    
for div in mydivs:
    if "Artikel" in div.getText():
        a_string = div.getText().split()[1]
        new_tag = soup.new_tag("a")
        new_tag['id'] = f'{a_string}'
        div.insert_before(new_tag)
        
print('soup after: ', soup)

Output:

soup before:  <p ><strong >Artikel 7.37 text text text</strong></p>
soup after:  <a id="7.37"></a><p ><strong >Artikel 7.37 text text text</strong></p>
  •  Tags:  
  • Related