I have an xlsx file, where each row corresponds to a sample with associated features in each column, as shown here: xlsx file example
I am trying to convert this xlsx file into a dat file, with multiple spaces separating the columns, as displayed in the example below:
samples property feature1 feature2 feature3
sample1 3.0862 0.8626 0.7043 0.6312
sample2 2.8854 0.7260 0.7818 0.6119
sample3 0.6907 0.4943 0.0044 0.4420
sample4 0.9902 0.0106 0.0399 0.9877
sample5 0.7242 0.0970 0.3199 0.5504
I have tried doing this by creating a dataframe in pandas and using dataframe.to_csv to save the file as a .dat, but it only allows me to use one character as a delimiter. Does anyone know how I might go about creating a file like the one above?
CodePudding user response:
You can use the string representation to_string of the dataframe, imported by pandas from Excel:
df = pd.read_excel('input.xlsx')
with open ('output.dat', 'w') as f:
f.write(df.to_string(index=False))
CodePudding user response:
This is another approach to do so without using DataFrame. We will have more flexibility since we do all the structure ourselves from the ground up.
Suppose you have read the xlsx file and store it in the form of 2-d list as follows:
lines = [['sample1', 3.0862, 0.8626, 0.7043, 0.6312],
['sample2', 2.8854, 0.7260, 0.7818, 0.6119],
['sample3', 0.6907, 0.4943, 0.0044, 0.4420],
['sample4', 0.9902, 0.0106, 0.0399, 0.9877],
['sample5', 0.7242, 0.0970, 0.3199, 0.5504]]
We can make use of string methods like ljust, rjust, or center. Right here, I just show you the use of ljust that takes the length as the first argument. The length will be the total width for left justification.
One could also use f-string to do padding in the format of f'{var:^10.4f}'. The meaning of each component is:
^represents centering (can be changed to<for left justification or>for right justification)10is the padding length.4is the number of decimal placesfmeans float
So, here is the final script.
padding1 = 12
padding2 = 10
print('samples'.ljust(padding1 1) 'property ' 'feature1 ' 'feature2 ' 'feature3')
for line in lines:
text = line[0].ljust(padding1)
for i in range(1, len(line)):
text = f'{line[i]:^{padding2}.4f}'
print(text)
