Home > Mobile >  scraping text item by beautifulsoup
scraping text item by beautifulsoup

Time:01-24

I do this code to get url code below from website: aa = contents.find_all('p', attrs={"class": "css-1ccncw", 'data-font-weight': 'semibold'})

this is my code:

from bs4 import BeautifulSoup

url = '''[<p  data-font-weight="semibold">Sponsored<!-- --> <span aria-hidden="true" ><svg  height="16" width="16"><path d="M8 14.75A6.75 6.75 0 1114.75 8 6.757 6.757 0 018 14.75zm0-12A5.25 5.25 0 1013.25 8 5.256 5.256 0 008 2.75z"></path><path d="M8 11.605a.75.75 0 01-.75-.75v-3a.75.75 0 011.5 0v3a.75.75 0 01-.75.75zM8 6.15a.998.998 0 01-.92-.62 1 1 0 011.63-1.09c.182.189.285.439.29.7a.996.996 0 01-.62.93 1 1 0 01-.38.08z"></path></svg></span></p>, <p  data-font-weight="semibold">Verified by Business<!-- --> <span aria-hidden="true" ><svg  height="18" width="18"><path d="M9 1a8 8 0 100 16A8 8 0 009 1zm3.96 6.279l-4.808 4.808L5.04 8.976a.8.8 0 011.131-1.131l1.981 1.979 3.677-3.676a.799.799 0 111.131 1.131z"></path></svg></span></p>, <p  data-font-weight="semibold">945 Taraval St Ste 201 San Francisco, CA 94116</p>]'''

soup=BeautifulSoup(url,'lxml')
for sd in url:
    add = sd[1]
    print(add)

I want to export the address: 945 Taraval St Ste 201 San Francisco, CA 94116

I try some methods but not working

CodePudding user response:

Assuming the format doesn't change, you can use this code:

soup.find_all("p", {"class": "css-1ccncw"})[2].text

.find_all method finds every elements that match the given conditions. First argument refers to the tag and the next one is a dict containing attributes the tag is supposed to have.

CodePudding user response:

like this

from bs4 import BeautifulSoup

url = '''[<p  data-font-weight="semibold">Sponsored<!-- --> <span aria-hidden="true" ><svg  height="16" width="16"><path d="M8 14.75A6.75 6.75 0 1114.75 8 6.757 6.757 0 018 14.75zm0-12A5.25 5.25 0 1013.25 8 5.256 5.256 0 008 2.75z"></path><path d="M8 11.605a.75.75 0 01-.75-.75v-3a.75.75 0 011.5 0v3a.75.75 0 01-.75.75zM8 6.15a.998.998 0 01-.92-.62 1 1 0 011.63-1.09c.182.189.285.439.29.7a.996.996 0 01-.62.93 1 1 0 01-.38.08z"></path></svg></span></p>, <p  data-font-weight="semibold">Verified by Business<!-- --> <span aria-hidden="true" ><svg  height="18" width="18"><path d="M9 1a8 8 0 100 16A8 8 0 009 1zm3.96 6.279l-4.808 4.808L5.04 8.976a.8.8 0 011.131-1.131l1.981 1.979 3.677-3.676a.799.799 0 111.131 1.131z"></path></svg></span></p>, <p  data-font-weight="semibold">945 Taraval St Ste 201 San Francisco, CA 94116</p>]'''

soup=BeautifulSoup(url,'lxml')
print(soup.text)

it will give you list of all the text inside the soup element and to get the specified address , try this

address = soup.find_all("p",class_="css-1ccncw")[-1]
print(address.text)
  •  Tags:  
  • Related