I have the following list, and I would like to split it into several lists when the element in the list is "\n".
Input:
['chain 2109 chrY 59373566 1266734 1266761 chrX 156040895 1198245 1198272 20769290\n', '27\n','\n','chain 2032 chrY 59373566 1136192 1136219 chrX 156040895 1086629 1086656 4047064\n','27\n','\n']
expected output:
[
['chain 2109 chrY 59373566 1266734 1266761 chrX 156040895 1198245 1198272 20769290', '27'],
['chain 2032 chrY 59373566 1136192 1136219 chrX 156040895 1086629 1086656 4047064', '27']
]
I tried stripping the elements with "\n" at the end of them and used and modified the accepted answer from this post:
for i, n in enumerate(lst):
if n != "\n":
lst[i] = lst[i].rstrip("\n")
[item.split(",") for item in ','.join(lst).split('\n') if item]
But since I am using a comma instead of a single white space to join and split, I get "" after splitting into several lists. How can I prevent this?
[
['chain 2109 chrY 59373566 1266734 1266761 chrX 156040895 1198245 1198272 20769290','27',''],
['','chain 2032 chrY 59373566 1136192 1136219 chrX 156040895 1086629 1086656 4047064','27','']
]
CodePudding user response:
This work for you?
list1 = ['chain 2109 chrY 59373566 1266734 1266761 chrX 156040895 1198245 1198272 20769290\n', '27\n','\n','chain 2032 chrY 59373566 1136192 1136219 chrX 156040895 1086629 1086656 4047064\n','27\n','\n']
list2 = []
tmp = []
for item in list1:
if item != '\n':
tmp.append(item.rstrip('\n'))
else:
#Note we aren't actually processing this item of the input list, as '\n' by itself is unwanted
list2.append(tmp)
tmp = []
CodePudding user response:
I would recommend splitting your list with more_itertools.split_at.
Because your original list ends with the separator, '\n', splitting it will result in the final item of your list being an empty sublist. The if check excludes this.
from more_itertools import split_at
original = [
'chain 2109 chrY 59373566 1266734 1266761 chrX 156040895 1198245 1198272 20769290\n',
'27\n',
'\n',
'chain 2032 chrY 59373566 1136192 1136219 chrX 156040895 1086629 1086656 4047064\n',
'27\n',
'\n'
]
processed = [
[item.rstrip() for item in sublist]
for sublist in split_at(original, lambda i: i == '\n')
if sublist
]
print(processed)
Output (line break added for clarity):
[['chain 2109 chrY 59373566 1266734 1266761 chrX 156040895 1198245 1198272 20769290', '27'],
['chain 2032 chrY 59373566 1136192 1136219 chrX 156040895 1086629 1086656 4047064', '27']]
