Home > database >  Beautiful Soup extract string before tag
Beautiful Soup extract string before tag

Time:01-26

I have an xml file that has ref tags nested inside para tags:

<para>here be text<ref> REF 1 </ref>and here be some more text</para>

Is there a way using Beautiful Soup to extract the string between the opening para tag and the opening ref tag, ie:

here be text

I've tried various things to no avail, including find_previous:

soup = BeautifulSoup(file, 'xml')

ref = soup.find('ref')
ref_before = ref.find_previous('para')

But (obviously) ref_before returns the entire contents of the para tag, ie:

here be text REF 1 and here be some more text

I think this ought to be really simple but I don't have much experience and just can't crack it. Any help much appreciated.

CodePudding user response:

You can use contents and select the first element:

soup.find('para').contents[0]

Output:

'here be text'
  •  Tags:  
  • Related