How to parse specific lines of data in Python-CodePudding

I have a text file with lines of data. Each line ends with "# Source [number]". There could be multiple sources such as "# Source 1,3".

Example of the text:

This is line one   # Source 3
This is line two   # Source 2
This is line three # Source 4,5
This is line four  # Source 5
This is line five  # Source 2

Question: How can I parse only the lines with sources interested. I want to get the lines that are in sources 4 and above. The result should be a list or dict as follows:

This is line three
This is line four

CodePudding user response：

Slightly overengineered but works exactly the way you want

import re

subject = """This is line one   # Source 3
This is line two   # Source 2
This is line three # Source 4,5
This is line four  # Source 5
This is line five  # Source 2
"""

matches = re.findall(r"[Source\s][\d(,)]{1,}", subject)
matches = list(map(lambda x: int(x) if not "," in x else list(map(int, x.split(","))), matches))
matches = list(map(lambda y: y if isinstance(y, int) else max(y), matches))

subject_lines = subject.splitlines()
subject_lines = list(map(lambda z: z[0], list(map(lambda q: q.split("#"), subject_lines))))

for index, each_source_value in enumerate(matches):
    if each_source_value >= 4:
        print(subject_lines[index])

Output :

This is line three 
This is line four

And works with other subject strings with similar conditions as well.

CodePudding user response：

This code will select lines based on the value given from input:

number = input("Get lines for source higher or equal than: ")

result = []

with open("D:/data.txt") as f:
    for line in f:
        phrase, comment = line.split("#")

        for item in comment.replace(",", " ").split():
            try:
                if int(item) >= int(number):
                    result.append(phrase.strip())
                    break
            except ValueError:
                pass

print(result)

For input value 4 this will output:

['This is line three', 'This is line four']