Home > Back-end >  Reading a XML file into various arrays
Reading a XML file into various arrays

Time:01-08

I have the following XML file.

<dos>
 <tot>
   <diagram type="tot" ns="1">
     <point e="-3.000000000" d="2.000000000"/>
     <point e="-2.993993994" d="4.000000000"/>
     <point e="-2.987987988" d="5.000000000"/>
     <point e="-2.981981982" d="0.600000000"/>
     <point e="-2.963963964" d="0.600000000"/>
   </diagram>
 </tot>
 <part type="par" species="1">
   <diagram ns="1" l="0" m="0">
     <point e="-3.000000000" d="0.002000000"/>
     <point e="-2.993993994" d="0.300000000"/>
     <point e="-2.987987988" d="4.000000000"/>
     <point e="-2.981981982" d="0.90000000"/>
   </diagram>
   <diagram ns="1" l="1" m="-1">
     <point e="-3.000000000" d="0.005000000"/>
     <point e="-2.993993994" d="0.040000000"/>
     <point e="-2.987987988" d="0.0700000000"/>
     <point e="-2.981981982" d="0.800000000"/>
   </diagram>
 </part>
 <part type="par" species="2">
   <diagram ns="1" l="0" m="0">
     <point e="-3.000000000" d="2.002000000"/>
     <point e="-2.993993994" d="3.300000000"/>
     <point e="-2.987987988" d="1.000000000"/>
     <point e="-2.981981982" d="2.90000000"/>
   </diagram>
   <diagram ns="1" l="1" m="-1">
     <point e="-3.000000000" d="3.005000000"/>
     <point e="-2.993993994" d="4.040000000"/>
     <point e="-2.987987988" d="5.0700000000"/>
     <point e="-2.981981982" d="2.800000000"/>
   </diagram>
 </part>
</dos>

I would like to get all points in each "diagram" block and preferably save them in different variables. Using the following simple code, I could extract all of these values.

from lxml import etree
from xml.dom import minidom

filedoss='./PDOS_RhSi/tmp.xml'

file = minidom.parse(filedoss)

tot = file.getElementsByTagName('tot')
pointsid = file.getElementsByTagName('point')

d_id = np.zeros((len(pointsid),2), dtype=float)

for i in range(len(pointsid)):
    d_id[i,0]=pointsid[i].attributes['e'].value
    d_id[i,1]=pointsid[i].attributes['d'].value

print(d_id)

which has the output of

[[-3.00000000e 00  2.00000000e 00]
 [-2.99399399e 00  4.00000000e 00]
 [-2.98798799e 00  5.00000000e 00]
 [-2.98198198e 00  6.00000000e-01]
 [-2.96396396e 00  6.00000000e-01]
 [-3.00000000e 00  2.00000000e-03]
 [-2.99399399e 00  3.00000000e-01]
 [-2.98798799e 00  4.00000000e 00]
 [-2.98198198e 00  9.00000000e-01]
 [-3.00000000e 00  5.00000000e-03]
 [-2.99399399e 00  4.00000000e-02]
 [-2.98798799e 00  7.00000000e-02]
 [-2.98198198e 00  8.00000000e-01]
 [-3.00000000e 00  2.00200000e 00]
 [-2.99399399e 00  3.30000000e 00]
 [-2.98798799e 00  1.00000000e 00]
 [-2.98198198e 00  2.90000000e 00]
 [-3.00000000e 00  3.00500000e 00]
 [-2.99399399e 00  4.04000000e 00]
 [-2.98798799e 00  5.07000000e 00]
 [-2.98198198e 00  2.80000000e 00]]

However, this way of reading my XML file combines all five blocks. How can I get read my XML file in such a way that I can bring the above array into 5 different arrays, for example, "tot", "par_species1_l0_m0", "par_species1_l0_m-1", "par_species2_l0_m0" and "par_species2_l0_m-1"?

CodePudding user response:

If I understand you correctly, this should get what you are after (or close enough):

diags = file.xpath('//diagram')
for diag in diags:
    atrs = diag.getparent().attrib
    if len(atrs)>0: 
        type = atrs.values()[0]
        spec = atrs.keys()[1]  
        spec_val = atrs.values()[1]            
  
        items = diag.attrib.items()[1:]
        l = "".join(items[0])
        m = "".join(items[1])
        
        print(f"{type}_{spec}{spec_val}_{l}_{m}")
    else:
        print(diag.getparent().tag)
        

Output:

tot
par_species1_l0_m0
par_species1_l1_m-1
par_species2_l0_m0
par_species2_l1_m-1
  •  Tags:  
  • Related