I'm using BeautifulSoup, and I need to get the xxx string from the following line:
from bs4 import BeautifulSoup
soup = BeautifulSoup('<p > xxx <br/> yyy <br/></p>',
'html.parser')
Usually, I would do the following:
one_a_tag = soup.p
t = one_a_tag.string
t
But that doesn't work, it returns None. However, if I delete <br/> yyy <br/> the code starts working. How do I extract xxx from the initial line?
CodePudding user response:
Try using .strings
x_and_y = list(soup.p.strings)
print(x_and_y)
Output: [' xxx ', ' yyy ']
.strings is a generator so the list() call is needed, but you also can use a for loop
CodePudding user response:
I'm getting the output as follows:
from bs4 import BeautifulSoup
soup = BeautifulSoup('<p > xxx <br/> yyy <br/></p>','html.parser')
tag= soup.select_one('p.object-attr-value').text
print(tag.split()[0])
Output:
xxx
CodePudding user response:
The reason why you get None as output when using .string is: (from the documentation)
If a tag contains more than one thing, then it’s not clear what
.stringshould refer to, so.stringis defined to beNone
So, to get the text xxx, using your example, you can use .find() and pass text=True as an argument:
from bs4 import BeautifulSoup
soup = BeautifulSoup(
'<p > xxx <br/> yyy <br/></p>', 'html.parser'
)
one_a_tag = soup.p
print(one_a_tag.find(text=True).strip())
