First time asking for help on stack overflow and I am way in over my head.
I am currently working on a project where I need to take percentage based coordinate tuples from very large, variable length xml files, split them into separate X and Y lists, and then find the average difference between values in the lists.
I am currently stuck on splitting the tuples into X and Y lists.
import xml.etree.ElementTree as ET'
lem = []
tree = ET.parse('testdata.xml')
root = tree.getroot()
for GazePointOnDisplayArea in root.findall("./GazeData/Left/GazePointOnDisplayArea"):
le = GazePointOnDisplayArea.get('Value')
lem.append(le)
print(lem)
#A test xml file shortened to five elements gives the following output
['(0.48734050, 0.50727710)', '(0.48989120, 0.50335540)', '(0.48709830, 0.50172430)', '(0.48531740, 0.50473010)', '(0.48797150, 0.51031550)']
Ideally I'd like to end up with
x = [0.48734050, 0.48989120, 0.48709830, 0.48531740, 0.48797150]
y = [0.50727710, 0.50335540, 0.50172430, 0.50473010, 0.51031550]
I've tried *zip and mapping methods but nothing seems to work with this. I'm unsure if I've made a parsing error, or if it is to do with there being a decimal, or whatever else.
I am open to using python, numpy, or pandas.
Please advise.
CodePudding user response:
From the output you're getting it's a one liner to the output you desire. First you extract the numbers using regular expressions and then you use numpy to rearrange them:
import re
import numpy as np
text = ['(0.48734050, 0.50727710)', '(0.48989120, 0.50335540)', '(0.48709830, 0.50172430)', '(0.48531740, 0.50473010)', '(0.48797150, 0.51031550)']
x,y = np.array([[float(x) for x in re.findall(r"(\d \.\d )",line) ] for line in text]).T
CodePudding user response:
Your issue is because this isn't really a tuple, but a string.
So you first need to convert the string tuple to a python literal tuple.
e.g. '(0.48734050, 0.50727710)' to (0.48734050, 0.50727710)
ast.literal_eval() helps us do that!
import ast
lem = ['(0.48734050, 0.50727710)', '(0.48989120, 0.50335540)', '(0.48709830, 0.50172430)', '(0.48531740, 0.50473010)', '(0.48797150, 0.51031550)']
x_list = []
y_list = []
for str_tuple in lem:
x, y = ast.literal_eval(str_tuple)
x_list.append(x)
y_list.append(y)
>>> print(x_list)
[0.4873405, 0.4898912, 0.4870983, 0.4853174, 0.4879715]
>>> print(y_list)
[0.5072771, 0.5033554, 0.5017243, 0.5047301, 0.5103155]
