Home > OS >  How to extract number of max results from pagination with beautifulsoup?
How to extract number of max results from pagination with beautifulsoup?

Time:01-22

I try to select the pagination section and like to extract the number of max results 2143:

numbers = contents.find(name="div", attrs={"class": "pagination"})
print(numbers .attrs)
print(numbers )
print(numbers .get_text(' ', strip=True))

and this code gives me result like that:

    {'class': ['pagination']}
    <div ><span>Showing 1-30 of 2143</span><ul><li><div ></div></li><li><span >1</span></li><li><a data-analytics='{"click_id":132,"module":1,"listing_page":2}' data-page="2" data-remote="true" href="/san-francisco-ca/dentists?page=2">2</a></li><li><a data-analytics='{"click_id":132,"module":1,"listing_page":3}' data-page="3" data-remote="true" href="/san-francisco-ca/dentists?page=3">3</a></li><li><a data-analytics='{"click_id":132,"module":1,"listing_page":4}' data-page="4" data-remote="true" href="/san-francisco-ca/dentists?page=4">4</a></li><li><a data-analytics='{"click_id":132,"module":1,"listing_page":5}' data-page="5" data-remote="true" href="/san-francisco-ca/dentists?page=5">5</a></li><li><a  data-analytics='{"click_id":132}' data-page="2" data-remote="true" href="/san-francisco-ca/dentists?page=2">Next</a></li></ul></div>
    Showing 1-30 of 2143 1 2 3 4 5 Next

How to extract the 2143 only:

Showing 1-30 of 2143 1 2 3 4 5 Next

CodePudding user response:

Select your tag more specific, one option ist to use css selectors to chain conditions - Select first direct <span> of <div> with class pagination, split the text by whitespace and grab the last element of the list:

soup.select_one('div.pagination > span').text.split(' ')[-1]

Example

html = '''<div ><span>Showing 1-30 of 2143</span><ul><li><div ></div></li><li><span >1</span></li><li><a data-analytics='{"click_id":132,"module":1,"listing_page":2}' data-page="2" data-remote="true" href="/san-francisco-ca/dentists?page=2">2</a></li><li><a data-analytics='{"click_id":132,"module":1,"listing_page":3}' data-page="3" data-remote="true" href="/san-francisco-ca/dentists?page=3">3</a></li><li><a data-analytics='{"click_id":132,"module":1,"listing_page":4}' data-page="4" data-remote="true" href="/san-francisco-ca/dentists?page=4">4</a></li><li><a data-analytics='{"click_id":132,"module":1,"listing_page":5}' data-page="5" data-remote="true" href="/san-francisco-ca/dentists?page=5">5</a></li><li><a  data-analytics='{"click_id":132}' data-page="2" data-remote="true" href="/san-francisco-ca/dentists?page=2">Next</a></li></ul></div>'''

soup=BeautifulSoup(html,'lxml')

soup.select_one('div.pagination > span').text.split(' ')[-1]

Output

2143

CodePudding user response:

Instead of numbers.get_text, find "span", get text and rsplit by 1 and take the second element:

out = numbers.find('span').text.rsplit(' ', 1)[1]

Output:

'2143'
  •  Tags:  
  • Related