im new to Python and im trying to make a web scraper to get the name and the ip of Minecraft server.
The problem is that I was able to get the value of the but for example the ip of the server is in a div inside de Im using pandas and lxml.html
example:
<tr>
<td >
<p><a href="#1.akumamc.net"><span >#1</span></a></p>
</td>
<td align="center">
<div >
<p> this is de ip of the server <p> -I WANT TO GET HERE-
</div>
</td>
</tr>
I dont know how to make to the div inside the tb. I have this script that I took from a page that works perfect to the other things but not for getting to the inside.
from numpy import tile
import requests
import lxml.html as lh
import pandas as pd
import re
#https://www.servidoresminecraft.info/1.8/
url='https://topminecraftservers.org/version/1.8.8'
#Create a handle, page, to handle the contents of the website
page = requests.get(url)
#Store the contents of the website under doc
doc = lh.fromstring(page.content)
#Parse data that are stored between <tr>..</tr> of HTML
tr_elements = doc.xpath('//tr')
#Check the length of the first 12 rows
[len(T) for T in tr_elements[:5]]
tr_elements = doc.xpath('//tr')
#Create empty list
col=[]
i=0
#For each row, store each first element (header) and an empty list
for t in tr_elements[0]:
i =1
name=t.text_content()
print ('%d:"%s"'%(i,name))
col.append((name,[]))
#Since out first row is the header, data is stored on the second row onwards
for j in range(1,len(tr_elements)):
#T is our j'th row
T=tr_elements[j]
#If row is not of size 10, the //tr data is not from our table
if len(T)!=3:
break
#i is the index of our column
i=0
#Iterate through each element of the row
for t in T.iterchildren():
data=t.text_content()
#Check if row is empty
if i>0:
#Convert any numerical value to integers
try:
if i==2 and j == 1:
print(2)
data=int(data)
except:
pass
#Append the data to the empty list of the i'th column
col[i][1].append(data)
#Increment i for the next column
i =1
[len(C) for (title,C) in col]
Dict={title:column for (title,column) in col}
df=pd.DataFrame(Dict)
print(df.head())
I just want to get and aotput thats shows the a table with the name of the server and the ip
Name ip
server1 xxx.xxx.x.x
server2 xxx.xxx.x.x
Any help??
CodePudding user response:
If I understand you correctly, this should get you what you're looking for:
servers = []
cols = ["Name", "ip"]
for s in doc.xpath("//td[@class='server-name']"):
s_ip = s.xpath(".//div[@class='server-ip input-group']//span[@class='form-control text-justify']/text()")[0]
s_name = s.xpath('.//h4/a/span/text()')[0]
servers.append([s_name,s_ip])
pd.DataFrame(servers, columns = cols)
Output:
Name ip
0 AkumaMC akumamc.net
1 BattleAsya 1.8-1.16 play.battleasya.com
2 Caraotacraft network PRISON caraotacraft.top
3 FlameSquad 87.121.54.214:25568
4 LunixCraft lunixcraft.dk
etc.
