from cgitb import text
from bs4 import BeautifulSoup
import requests
website = 'https://www.marketplacehomes.com/rent-a-home/'
result = requests.get(website)
content = result.text
soup = BeautifulSoup(content, 'html.parser')
lists = soup.find_all('div', class_=('tt-rental-row'))
for list in lists:
location = list.find('span', class_="renta;-adress")
beds = list.find('span', class_="renta;-beds")
baths = list.find('span', class_="renta;-beds")
availability = list.find('span', class_="rental-date-available")
info = [location, beds, baths, availability]
print(info)
If I try to run the last line of code, I get:
"IndentationError: expected an indented block"
If I try to run each indentation separately I get:
">>> location = list.find('span', class_="renta;-adress")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: type object 'list' has no attribute 'find'"
I'm new to Python and I'm kinda stuck, can anyone please help me?
CodePudding user response:
Since the word list is a built-in keyword in python you can't use it as variable name try another name
for myList in lists:
location = myList.find('span', class_="renta;-adress")
beds = myList.find('span', class_="renta;-beds")
baths = myList.find('span', class_="renta;-beds")
availability = myList.find('span', class_="rental-date-available")
info = [location, beds, baths, availability]
print(info)
CodePudding user response:
It's not possible to extract data from HTML DOM using bs4 because data is dynamically populating via API.
Example:
import requests
import pandas as pd
api_url = 'https://app.tenantturner.com/listings-json/2679'
req=requests.get(api_url).json()
info=[]
for item in req:
location = item['address']
beds = item['beds']
baths = item['baths']
availability = item['dateAvailable']
info.append({
'location':location,
'beds':beds,
'baths':baths,
'availability':availability})
df= pd.DataFrame(info)
print(df)
Output:
location beds baths availability
0 4481 Jack Faulk St 4 2 Now
1 213 Skybranch Court 3 2.5 Now
2 8127 Olive Brook Dr 3 2.0 Now
3 735 Grace Ave 3 2.0 Now
4 1810 E 41st St 3 1.0 Now
.. ... ... ... ...
86 447 Union Station St 2 2.5 Now
87 5819 Fairdale Lane C 3 3.5 Now
88 10020 Braxton Dr 4 3 Now
89 709 Hilchot Dr 3 2.5 Now
90 5042 Pike Loop 3 2.5 Now
[91 rows x 4 columns]
CodePudding user response:
Note: Your code never runs the for-loop cause your selection never matches the elements in HTML. They are generated dynamically based on data from another ressource and requests do not render websites like a browser, it only uses static contents from response.
Be aware not to use built-in keywords they will cause errors, especialy in your case list.find() will raise one cause the type object 'list' do not has an attribute called find. You could simply check these things using type()
type(soup)
-> its a bs4.BeautifulSoup
type(soup.find_all('div', class_=('tt-rental-row')))
-> its a bs4.element.ResultSet
type(list)
-> its a type
So how to get your goal?
You could also use pandas to directly create a DataFrame and slice it to your needs:
import pandas as pd
pd.read_json('https://app.tenantturner.com/listings-json/2679')
Output:
id dateActivated latitude longitude address city state zip photo title ... baths dateAvailable rentAmount acceptPets applyUrl btnUrl btnText virtualTour propertyType enableWaitlist
0 83600 8/22/2022 35.750499 -86.393972 4481 Jack Faulk St Murfreesboro TN 37127 https://ttimages.blob.core.windows.net/propert... 4481 Jack Faulk St ... 2.0 Now 2195 cats, small dogs, large dogs https://app.propertyware.com/pw/application/#/... https://app.tenantturner.com/qualify/4481-jack... Schedule Viewing None Single Family False
1 100422 8/31/2022 30.277607 -95.472842 213 Skybranch Court Conroe TX 77304 https://ttimages.blob.core.windows.net/propert... 213 Skybranch Court ... 2.5 Now 2100 cats, small dogs, large dogs https://app.propertyware.com/pw/application/#/... https://app.tenantturner.com/qualify/213-skybr... Schedule Viewing None Condo Unit False
2 106976 7/27/2022 28.274720 -82.298077 8127 Olive Brook Dr Wesley Chapel FL 33545 https://ttimages.blob.core.windows.net/propert... 8127 Olive Brook Dr ... 2.0 Now 2650 no pets https://app.propertyware.com/pw/application/#/... https://app.tenantturner.com/qualify/8127-oliv... Schedule Viewing None Single Family False
3 116188 8/15/2022 42.624023 -83.144614 735 Grace Ave Rochester Hills MI 48307 https://ttimages.blob.core.windows.net/propert... 735 Grace Ave ... 2.0 Now 1600 cats, small dogs, large dogs https://app.propertyware.com/pw/application/#/... https://app.tenantturner.com/qualify/735-grace... Schedule Viewing None Single Family False
4 126846 8/22/2022 32.046455 -81.071181 1810 E 41st St Savannah GA 31404 https://ttimages.blob.core.windows.net/propert... 1810 E 41st St ... 1.0 Now 1395 small dogs https://app.propertyware.com/pw/application/#/... https://app.tenantturner.com/qualify/1810-e-41... Schedule Viewing None Single Family True
...
91 rows × 22 columns
Example:
To show only specifc columns, simply pass a list of there names.
import pandas as pd
pd.read_json('https://app.tenantturner.com/listings-json/2679')[['address', 'city','state', 'zip', 'title', 'beds', 'baths','dateAvailable']]
Output
address beds baths dateAvailable
0 4481 Jack Faulk St 4 2.0 Now
1 213 Skybranch Court 3 2.5 Now
2 8127 Olive Brook Dr 3 2.0 Now
3 735 Grace Ave 3 2.0 Now
4 1810 E 41st St 3 1.0 Now
... ... ... ... ...
91 rows × 4 columns
