I have the following sample text file:
DATASET
OBJTYPE "mesh2d"
BEGALD
ND 58673
NC 116294
TIMEUNITS SECONDS
TS 0 1.98849600e 08
0.000000000e 00
0.56000000e 00
0.200000000e 00
0.00000000e 00
0.100000000e 00
0.00000000e 00
0.00000000e 00
0.73400000e 00
TS 0 1.98853209e 08
0.00000000e 00
1.00500000e 00
4.00000000e 00
6.00000000e-05
9.00000000e 00
0.00000000e 00
0.00000000e 00
TS 0 1.98856959e 08
0.00000000e 00
1.38000000e 00
4.00000000e 00
3.00000000e-05
8.10000000e 00
2.45000000e 00
0.00000000e 00
0.00000000e 00
TS 0 1.98860419e 08
0.00000000e 00
1.40000000e 00
7.00000000e 00
3.00000000e-05
9.00000000e 00
0.00000000e 00
0.00000000e 00
0.00000000e 00
TS 0 1.98864081e 08
0.00000000e 00
0.00000000e 00
0.00000000e 00
3.00000000e-05
0.00000000e 00
0.00000000e 00
0.00000000e 00
0.00000000e 00
TS 0 1.98867619e 08
0.00000000e 00
0.00000000e 00
8.00000000e 00
3.50000000e-05
10.00000000e 00
0.00000000e 00
5.50000000e 00
0.00000000e 00
ENDDS
I want to extract the time stamps from the line starting with 'TS 0 ' and the 2nd, 5th and 8th lines after every 'TS 0 ' match is found. Now, I have huge file which is more than 10 GB, so I don't want to read the whole file into memory.
This is what I could come up with:
with open(r"file") as f:
for line in f:
if line.startswith("TIMEUNITS SECONDS"):
break # file handlers will start from next line
time=[] # list for storing time stamps
line2=[] # or lines=[2,5,8]
line5=[]
line8=[]
line
for line in f:
if line.startswith("TS"):
print(line.strip()) # extract all TS
ts=float(line.split()[2])
time.append(ts)
It only extracts the time stamps but how to extract the 2nd,5th and 8th lines using a loop or any other faster method without reading the whole file.
CodePudding user response:
A file object is iterable in python, and retains its position between calls to iter, which you've used to skip the initial section. Keep using the same technique to find the lines you need:
with open(r"file") as f:
for line in f:
if line.startswith("TIMEUNITS SECONDS"):
break
time = []
line2 = []
line5 = []
line8 = []
for line in f:
if line.startswith("TS"):
ts = float(line.strip().split()[2])
time.append(ts)
for _ in range(2):
line = next(file)
line2.append(float(line.strip()))
for _ in range(3):
line = next(file)
line5.append(float(line.strip()))
for _ in range(3):
line = next(file)
line8.append(float(line.strip()))
Now that you have the basic structure down, you can factor out the repeated code into a function and add some error checking:
def find(file, s):
for line in file:
if line.startswith(s):
return line
return None
def skip(file, n):
for i, line in zip(range(n), file):
pass
return line if i == n - 1 else None
def load(filename):
with open(filename) as f:
if not find(f, "TIMEUNITS SECONDS"):
return None
time = []
line2 = []
line5 = []
line8 = []
while True:
if not (line := find(f, "TS")):
break
time.append(float(line.strip().split()[2]))
if not (line := skip(f, 2)):
break
line2.append(float(line.strip()))
if not (line := skip(f, 3):
break
line5.append(float(line.strip()))
if not (line := skip(f, 3):
break
line8.append(float(line.strip()))
return time, line2, line5, line8
