I have a large DataFrame of coordinates, and I have to write its content into an input file for another program. For now I have a nested, quite complex for-loop and many .write() statements - it takes a lot of time to write the file. I can't find nor figure out any smarter way to do it.
DataFrame
| type | Uiso | x | y | z |
|---|---|---|---|---|
| H | 0.0320 | 0.257510 | 0.254363 | 0.021930 |
| H | 0.0330 | 0.255228 | 0.163941 | 0.038431 |
| H | 0.0330 | 0.255228 | 0.163941 | 0.038431 |
| C | 0.0278 | 0.122879 | 0.207314 | 0.027545 |
| H | 0.0320 | 0.534974 | 0.254363 | 0.021930 |
Input
- First line: atom type.
- Second line: number of these atom types in structure, atom number (H: 1, C: 6, etc.), then occupancy (always 1), then Uiso.
- Rest of the lines: x y z coordinates.
Repeat for each atom type in DataFrame.
H
4 1 1.000 0.0320
0.257510 0.254363 0.021930
0.255228 0.163941 0.038431
0.255228 0.163941 0.038431
0.534974 0.254363 0.021930
C
1 6 1.000 0.0278
0.122879 0.207314 0.027545
Code:
f = open(file_name ".inp", "w")
for type in structure["type"].unique():
for Uiso in structure[structure["type"] == type]["Uiso"].unique():
f.write(" " type "\n")
f.write(" " str(len(list(structure[(structure["type"] == type) & (structure["Uiso"] == Uiso)][["x", "y", "z"]].iterrows()))) " ")
f.write(str(atomic_numbers[type]))
f.write(str(" 1.000 "))
f.write(str(Uiso) "\n")
for coord in structure[(structure["type"] == type) & (structure["Uiso"] == Uiso)].itertuples():
f.write(" "
str(coord.x) " "
str(coord.y) " "
str(coord.z) "\n")
f.close()
Question
Is there a way to reduce the number of loops? Maybe write whole sorted portions of DataFrame at once?
CodePudding user response:
There are lots of ways to approach this, e.g. making a template string then filling it with format(), but your case might be compact enough to do it on the fly, so to speak.
For example, this does more or less what you want for a DataFrame df:
import numpy as np
with open('data.inp', 'wt') as f:
for atom, group in df.groupby('type'):
# Write the header info using an f-string.
f.write(f"{atom}\n {len(group)} 1 1.000 {group['Uiso'].iloc[0]:.4f}\n")
# Write the data table using savetxt for fixed-width columns.
data = group.loc[:, ['x', 'y', 'z']]
np.savetxt(f, data, fmt='.6f', newline='\n')
This produces:
C
1 1 1.000 0.0278
0.122879 0.207314 0.027545
H
4 1 1.000 0.0320
0.257510 0.254363 0.021930
0.255228 0.163941 0.038431
0.255228 0.163941 0.038431
0.534974 0.254363 0.021930
Some notes:
- Always use a context for file reading and writing (the
withblock). It's the safest thing to do. - I tried using
pd.to_csv()but I don't think it can do fixed width columns, whereasnp.savetxt()can. - I sort of guessed/made up the info on the second line of each record; you might need to change some of that.
- You could change the sort order by throwing
sorted()around thegroupbyusing some appropriate key (e.g. a table to look up atomic number or something).
