Scrapping Tables on a Web page with BeautifulSoap-CodePudding

I need to do a DataFrame in Python with the information of Top 500 Americas Companies:

https://www.americaeconomia.com/negocios-industrias/estas-son-las-500-mayores-empresas-de-america-latina-2021

I tried to do web scrapping and when I print(tabla) it said [] or None...

from bs4 import BeautifulSoup
import requests

url = 'https://www.americaeconomia.com/negocios-industrias/estas-son-las-500-mayores-empresas-de-america-latina-2021'
page = requests.get(url)

soup = BeautifulSoup(page.text, 'html.parser')

tabla = soup.find('table', {"id":"awesomeTable"})
print(tabla)

CodePudding user response：

What happens?

Always look in your soup first - therein lies the truth. The content can always be slightly to extremely different from the view in the development tools.

You won't find the table in your soup, cause it is in iframe.

How to fix?

Use the url of the iframe source to perform your request:

https://rk.americaeconomia.com/display/embed/500-latam/2021

Example

import requests
from bs4 import BeautifulSoup
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}
r = requests.get('https://rk.americaeconomia.com/display/embed/500-latam/2021',headers=headers)
soup = BeautifulSoup(r.text,'lxml')
data = []
for row in soup.select('#awesomeTable tbody tr.dataRow'):
    data.append(list(row.stripped_strings))

pd.DataFrame(data, columns=list(soup.select_one('#awesomeTable tr').stripped_strings))

Output

RK 2021	EMPRESA	PAÍS
1	PETROBRAS	BRA
2	JBS	BRA
3	AMÉRICA MÓVIL	MX
4	PEMEX	MX
5	VALE	BRA
...	...	...