Home > OS >  How to parse specific lines of data in Python
How to parse specific lines of data in Python

Time:01-05

I have a text file with lines of data. Each line ends with "# Source [number]". There could be multiple sources such as "# Source 1,3".

Example of the text:

This is line one   # Source 3
This is line two   # Source 2
This is line three # Source 4,5
This is line four  # Source 5
This is line five  # Source 2

Question: How can I parse only the lines with sources interested. I want to get the lines that are in sources 4 and above. The result should be a list or dict as follows:

This is line three
This is line four

CodePudding user response:

Slightly overengineered but works exactly the way you want

import re

subject = """This is line one   # Source 3
This is line two   # Source 2
This is line three # Source 4,5
This is line four  # Source 5
This is line five  # Source 2
"""

matches = re.findall(r"[Source\s][\d(,)]{1,}", subject)
matches = list(map(lambda x: int(x) if not "," in x else list(map(int, x.split(","))), matches))
matches = list(map(lambda y: y if isinstance(y, int) else max(y), matches))

subject_lines = subject.splitlines()
subject_lines = list(map(lambda z: z[0], list(map(lambda q: q.split("#"), subject_lines))))

for index, each_source_value in enumerate(matches):
    if each_source_value >= 4:
        print(subject_lines[index])

Output :

This is line three 
This is line four 

And works with other subject strings with similar conditions as well.

CodePudding user response:

This code will select lines based on the value given from input:

number = input("Get lines for source higher or equal than: ")

result = []

with open("D:/data.txt") as f:
    for line in f:
        phrase, comment = line.split("#")

        for item in comment.replace(",", " ").split():
            try:
                if int(item) >= int(number):
                    result.append(phrase.strip())
                    break
            except ValueError:
                pass

print(result)

For input value 4 this will output:

['This is line three', 'This is line four']
  •  Tags:  
  • Related