I have a text file with lines of data. Each line ends with "# Source [number]". There could be multiple sources such as "# Source 1,3".
Example of the text:
This is line one # Source 3
This is line two # Source 2
This is line three # Source 4,5
This is line four # Source 5
This is line five # Source 2
Question: How can I parse only the lines with sources interested. I want to get the lines that are in sources 4 and above. The result should be a list or dict as follows:
This is line three
This is line four
CodePudding user response:
Slightly overengineered but works exactly the way you want
import re
subject = """This is line one # Source 3
This is line two # Source 2
This is line three # Source 4,5
This is line four # Source 5
This is line five # Source 2
"""
matches = re.findall(r"[Source\s][\d(,)]{1,}", subject)
matches = list(map(lambda x: int(x) if not "," in x else list(map(int, x.split(","))), matches))
matches = list(map(lambda y: y if isinstance(y, int) else max(y), matches))
subject_lines = subject.splitlines()
subject_lines = list(map(lambda z: z[0], list(map(lambda q: q.split("#"), subject_lines))))
for index, each_source_value in enumerate(matches):
if each_source_value >= 4:
print(subject_lines[index])
Output :
This is line three
This is line four
And works with other subject strings with similar conditions as well.
CodePudding user response:
This code will select lines based on the value given from input:
number = input("Get lines for source higher or equal than: ")
result = []
with open("D:/data.txt") as f:
for line in f:
phrase, comment = line.split("#")
for item in comment.replace(",", " ").split():
try:
if int(item) >= int(number):
result.append(phrase.strip())
break
except ValueError:
pass
print(result)
For input value 4 this will output:
['This is line three', 'This is line four']
