I have a multiple sequence alignment file such as the the following
JF735120.1_1-200 TCTTCACGCAGAAAGCGTCTAGCCATGGCGTTAGTATGAGTGTCGTACAGCCTCCAGGCC
NC_009823.1_1-200 TCTTCACGCAGAAAGCGTCTAGCCATGGCGTTAGTATGAGTGTCGTACAGCCTCCAGGCC
KM349851.1_1-200 TCTTCACGCAGAAAGCGTCTAGCCATGGCGTTAGTATGAGTGTCGTACAGCCTCCAGGCC
JF735122.1_1-200 TCTTCACGCGGAAAGCGTCTAGCCATGGCGTTAGTACGAGTGTCGTGCAGCCTCCAGGCC
AF177036.1_1-200 --TTCACGCAGAAAGCGTCTAGCCATGGCGTTAGTATGAGTGTCGTACAGCCTCCAGGAC
******* ******* ****************** ********* *********** *
using python how do I iterate over the file find asterisk sign and print only the previous line at the same position of the asterisk? without the use of any tools. output should be like
TTCACGC GAAAGCG CTAGCCATGGCGTTAGTA GAGTGTCGT CAGCCTCCAGG C
CodePudding user response:
Perhaps something like this?
#!/usr/bin/env python3
from io import StringIO
exampleData = '''
line 1
line 2
***
xyz
pqr
***
'''
prevLine = None
for line in StringIO(exampleData):
# Check if the line has an asterisk, if so we want to print the previous line
if line.find('*') != -1:
if prevLine is not None:
print(prevLine)
else:
print("ERROR: no previous line")
# Track the previous line, every loop.
prevLine = line.strip()
This outputs:
line 2
pqr
Is that what you want?
Of course, you'd replace StringIO with a real file (open(filename) or so), this is just an example.
Though if you have a standard file format, like tab-separated, I'd recommend using an appropriate library like csv, rather than parsing yourself, like this.
CodePudding user response:
file content c:/test.txt
JF735120.1_1-200 TCTTCACGCAGAAAGCGTCTAGCCATGGCGTTAGTATGAGTGTCGTACAGCCTCCAGGCC
NC_009823.1_1-200 TCTTCACGCAGAAAGCGTCTAGCCATGGCGTTAGTATGAGTGTCGTACAGCCTCCAGGCC
KM349851.1_1-200 TCTTCACGCAGAAAGCGTCTAGCCATGGCGTTAGTATGAGTGTCGTACAGCCTCCAGGCC
JF735122.1_1-200 TCTTCACGCGGAAAGCGTCTAGCCATGGCGTTAGTACGAGTGTCGTGCAGCCTCCAGGCC
AF177036.1_1-200 --TTCACGCAGAAAGCGTCTAGCCATGGCGTTAGTATGAGTGTCGTACAGCCTCCAGGAC
******* ******* ****************** ********* *********** *
import re
p = re.compile('[*] ')
prev_l=""
with open('c:/test.txt') as my_file:
for line in my_file:
if re.match('[\s*] ',line) and prev_l:
iterator = p.finditer(line)
f=[prev_l[match.start():match.end()] for match in iterator]
f
else:
prev_l=line
results
['TTCACGC', 'GAAAGCG', 'CTAGCCATGGCGTTAGTA', 'GAGTGTCGT', 'CAGCCTCCAGG', 'C']
