Home > OS >  Cannot convert XML to Python Dataframe
Cannot convert XML to Python Dataframe

Time:01-21

I have an xml that looks something like that. (It's longer so did not paste the whole thing) I am trying to read the mentioned file with read_xml, but it is just printing a table full of NaN Values. how can I resolve it? (Newby in terms of XML files)

import numpy as np
import pandas as pd

from tkinter import filedialog as fd

filename = fd.askopenfilename()
 
df = pd.read_xml('{}'.format(filename), )

print(df)



<ScheduleMessage xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" DtdVersion="3" DtdRelease="3">
  <MessageIdentification v="20211022_DA_POS_65XGENEXMARKET0I" />
  <MessageVersion v="1" />
  <MessageType v="A01" />
  <ProcessType v="A01" />
  <ScheduleClassificationType v="A01" />
  <SenderIdentification v="65XGENEXMARKET0I" codingScheme="A01" />
  <SenderRole v="A01" />
  <ReceiverIdentification v="10X1001C--00007L" codingScheme="A01" />
  <ReceiverRole v="A04" />
  <MessageDateTime v="2021-10-21T10:02:02Z" />
  <ScheduleTimeInterval v="2021-10-21T22:00Z/2021-10-22T22:00Z" />
  <ScheduleTimeSeries>
    <SendersTimeSeriesIdentification v="S_10Y1001A1001B012_65YBG-ENERGRIDDB" />
    <SendersTimeSeriesVersion v="1" />
    <BusinessType v="A02" />
    <Product v="8716867000016" />
    <ObjectAggregation v="A03" />
    <InArea v="10Y1001A1001B012" codingScheme="A01" />
    <OutArea v="10Y1001A1001B012" codingScheme="A01" />
    <InParty v="65YBGGENEX000002" codingScheme="A01" />
    <OutParty v="65YBG-ENERGRIDDB" codingScheme="A01" />
    <MeasurementUnit v="MAW" />
    <Period>
      <TimeInterval v="2021-10-21T22:00Z/2021-10-22T22:00Z" />
      <Resolution v="PT1H" />
      <Interval>
        <Pos v="1" />
        <Qty v="0" />
      </Interval>

      

CodePudding user response:

I would start by validating XML file. Based on the code you have shared, it seems like that this is not a valid XML file.

In order to read XML file through pandas and convert into csv or excel file, you can use pandas_read_xml library:

import pandas_read_xml as pdx

And then you can read the file via below code line:

df = pdx.read_xml('path-to-your-XML-file.xml')

You also need to flatten after reading XML file:

df = pdx.fully_flatten(df)
  •  Tags:  
  • Related