Home > Software design >  Convert a bytes iterable to an iterable of str, where each value is a line
Convert a bytes iterable to an iterable of str, where each value is a line

Time:01-09

I have an iterable of bytes, such as

bytes_iter = (
    b'col_1,',
    b'c',
    b'ol_2\n1',
    b',"val',
    b'ue"\n',
)

(but typically this would not be hard coded or available all at once, but supplied from a generator say) and I want to convert this to an iterable of str lines, where line breaks are unknown up front, but could be any of \r, \n or \r\n. So in this case would be:

lines_iter = (
    'col_1,col_2',
    '1,"value"',
)

(but again, just as an iterable, not so it's all in memory at once).

How can I do this?

Context: my aim is to then pass the iterable of str lines to csv.reader (that I think needs whole lines?), but I'm interested in this answer just in general.

CodePudding user response:

I used yield and re.split.

(?:/r/n)|(?:/r)|(?:/n) means match /r, /n or /r/n.

import re
split_rule = re.compile("(?:/r/n)|(?:/r)|(?:/n)")


def converter(byte_data):
    left_d = ""
    for d in byte_data:
        t = split_rule.split(left_d   d.decode())
        last_index = len(t) - 1
        for index, i in enumerate(t):
            if index != last_index:
                yield i
            else:
                left_d = i
    else:
        yield left_d


for i in (converter(iter((
    b'col_1,',
    b'c',
    b'ol_2\n1',
    b',"val;',
    b'ue"\n')))
):
    print(i)

Output:

col_1,col_2
1,"val;ue"

CodePudding user response:

Use the io module to do most of the work for you:

class ReadableIterator(io.IOBase):
    def __init__(self, it):
        self.it = iter(it)
    def read(self, n):
        # ignore argument, nobody actually cares
        # note that it is *critical* that we suppress the `StopIteration` here
        return next(self.it, b'')
    def readable(self):
        return True

then just call io.TextIOWrapper(ReadableIterator(some_iterable_of_bytes)).

  •  Tags:  
  • Related