I am trying to split a string by commas using python, but allow users to include commas within some of the key pairs. Here are two examples of the strings I am working with:
title.search:The relation between visualization size, grouping, and user performance,publication_year:2020
author.id:c33432,title.search:The relation between visualization size, grouping, and user performance,publication_year:2020
What I want this to turn into is:
["title.search:The relation between visualization size, grouping, and user performance", "publication_year:2020"]
["author.id:c33432", "title.search:The relation between visualization size, grouping, and user performance", "publication_year:2020"]
What helps me is that the part before the colon (the key) will always be written in one of three formats, such as:
- type
- author.id
- author.institutions.country_code
So it can be a single word, two words separated by a period, or three words separated by periods.
Any ideas on if this is possible?
CodePudding user response:
As per I can see, you're trying to split by comma within text, the regex in this case is \w,\w.
CodePudding user response:
Would you please try the following:
#!/usr/bin/python
import re
s = ['title.search:The relation between visualization size, grouping, and user performance,publication_year:2020',
'author.id:c33432,title.search:The relation between visualization size, grouping, and user performance,publication_year:2020']
for str in s:
m = re.split(r',(?=\s*[\w.] :)', str)
print(m)
Output:
['title.search:The relation between visualization size, grouping, and user performance', 'publication_year:2020']
['author.id:c33432', 'title.search:The relation between visualization size, grouping, and user performance', 'publication_year:2020']
The regex ,(?=\s*[\w.] :) matches a comma followed by
- zero or more blank characters
- a sequence of word characters and/or a dot character
- a colon character
in order.
Then the string is splitted on comma(s) which satisfy the condition above.
