Home > Mobile >  Getting same response thrice using beautifulsoup
Getting same response thrice using beautifulsoup

Time:02-08

I'm trying to grab the email for a school project from this webpage, which I am successfully able to along with the name of the organization but now I'm having a new problem. Looks like its grabbing it 3 times, which is causing an issue to my lists. Removing Dupes in Post is not ideal in this situation. Anybody have any idea how I can just grab the email & organization name just 1 time?

Is it an issue with my for loop?

Code:

U4Etest1 = ['http://www.usavolleyballclubs.com/VolleyballClubDirectory.asp?Customer_ID=26045','http://www.usavolleyballclubs.com/VolleyballClubDirectory.asp?Customer_ID=36914']
email_list = []
org_name_list = []

for u in U4Etest1:
    url2 = u
    driver.get(url2)
    time.sleep(3)
    html = urlopen(url2)
    soup = BeautifulSoup(html, 'lxml')
    emailsoup = soup.find('table', class_="table table-striped")
    
    for es in emailsoup:
        org_name2 = emailsoup.find('h3').text
        org_name_list.append(org_name2)
        
        try:
            malito = emailsoup.find('a', {'target':'_top'})['href']
            email_list.append(malito)
        except:
            email_list.append('N/A')
        print(f'''
        Org Name: {org_name2}
        Email: {malito}
        ''')

Output:

Org Name: Eastern Elite  
Email: mailto:[email protected]&[email protected]&subject=Club Volleyball Inquiry from 
        

Org Name: Eastern Elite  
Email: mailto:[email protected]&[email protected]&subject=Club Volleyball Inquiry from 
        

Org Name: Eastern Elite  
Email: mailto:[email protected]&[email protected]&subject=Club Volleyball Inquiry from 
        

Org Name: Corpus Christi Legacy Volleyball Club  
Email: mailto:[email protected]&[email protected]&subject=Club Volleyball Inquiry from 
        

Org Name: Corpus Christi Legacy Volleyball Club  
Email: mailto:[email protected]&[email protected]&subject=Club Volleyball Inquiry from 
        

Org Name: Corpus Christi Legacy Volleyball Club  
Email: mailto:[email protected]&[email protected]&subject=Club Volleyball Inquiry from 

CodePudding user response:

Firstly, why are you even using a for loop for html? Or Selenium, if you are not using it? Secondly, please always add imports and variables along with code.

Following code works for me:

import time
from bs4 import BeautifulSoup
from urllib.request import urlopen

urls = ['http://www.usavolleyballclubs.com/VolleyballClubDirectory.asp?Customer_ID=26045',
        'http://www.usavolleyballclubs.com/VolleyballClubDirectory.asp?Customer_ID=36914']
email_list = []
org_name_list = []

for url in urls:
    time.sleep(3)
    html = urlopen(url)
    soup = BeautifulSoup(html, 'lxml')
    email_soup = soup.find('table', class_="table table-striped")
    org_name = email_soup.find('h3').text
    org_name_list.append(org_name)
    try:
        malito = email_soup.find('a', {'target': '_top'})['href']
        email_list.append(malito)
        print(f'Org Name: {org_name} \nEmail: {malito}')
    except:
        email_list.append('N/A')
  •  Tags:  
  • Related