Home > Mobile >  Get one elements inside <tb> with Python
Get one elements inside <tb> with Python

Time:01-13

im new to Python and im trying to make a web scraper to get the name and the ip of Minecraft server.

The problem is that I was able to get the value of the but for example the ip of the server is in a div inside de Im using pandas and lxml.html

example:

<tr>
        <td >
            <p><a href="#1.akumamc.net"><span >#1</span></a></p>
        </td>

        <td  align="center"> 
           <div >
              <p> this is de ip of the server <p>     -I WANT TO GET HERE-
           </div> 
        </td>
</tr>

I dont know how to make to the div inside the tb. I have this script that I took from a page that works perfect to the other things but not for getting to the inside.

from numpy import tile
import requests
import lxml.html as lh
import pandas as pd
import re

#https://www.servidoresminecraft.info/1.8/

url='https://topminecraftservers.org/version/1.8.8'
#Create a handle, page, to handle the contents of the website
page = requests.get(url)
#Store the contents of the website under doc
doc = lh.fromstring(page.content)
#Parse data that are stored between <tr>..</tr> of HTML
tr_elements = doc.xpath('//tr')



#Check the length of the first 12 rows
[len(T) for T in tr_elements[:5]]

tr_elements = doc.xpath('//tr')
#Create empty list
col=[]
i=0
#For each row, store each first element (header) and an empty list
for t in tr_elements[0]:
    i =1
    name=t.text_content()
    print ('%d:"%s"'%(i,name))
    col.append((name,[]))

#Since out first row is the header, data is stored on the second row onwards
for j in range(1,len(tr_elements)):
    #T is our j'th row
    T=tr_elements[j]
    
    #If row is not of size 10, the //tr data is not from our table 
    if len(T)!=3:
        break
    
    #i is the index of our column
    i=0
    
    #Iterate through each element of the row
    for t in T.iterchildren():
        data=t.text_content() 
        #Check if row is empty
        if i>0:
        #Convert any numerical value to integers
            try:
                if i==2 and j == 1:
                    print(2)
                data=int(data)
            except:
                pass
        #Append the data to the empty list of the i'th column
        col[i][1].append(data)
        #Increment i for the next column
        i =1

[len(C) for (title,C) in col]

Dict={title:column for (title,column) in col}
df=pd.DataFrame(Dict)


print(df.head())

I just want to get and aotput thats shows the a table with the name of the server and the ip

Name      ip
server1   xxx.xxx.x.x
server2   xxx.xxx.x.x

Any help??

CodePudding user response:

If I understand you correctly, this should get you what you're looking for:

servers = []
cols = ["Name", "ip"]
for s in doc.xpath("//td[@class='server-name']"):
    s_ip = s.xpath(".//div[@class='server-ip input-group']//span[@class='form-control text-justify']/text()")[0]
    s_name = s.xpath('.//h4/a/span/text()')[0]
    servers.append([s_name,s_ip])
pd.DataFrame(servers, columns = cols)

Output:

    Name                          ip
0   AkumaMC                       akumamc.net
1   BattleAsya 1.8-1.16           play.battleasya.com
2   Caraotacraft network PRISON   caraotacraft.top
3   FlameSquad                    87.121.54.214:25568
4   LunixCraft                    lunixcraft.dk

etc.

  •  Tags:  
  • Related