I have a string like,
str1 = "ZZZ。10月,AAA。11月2日,BBB。CCC。3日,DDD。EEE。12月,FFF"
And I want to split this string by two conditions: 日 or 月 appear at the begining of string, at the same time, period 。at the end of string. Thus, the result should like,
# ZZZ。 / 10月,AAA。/ 11月2日,BBB。CCC。/3日,DDD。EEE。/12月,FFF
And now, my idea is split them by period at first, then combine each of them according to the second rules(日 or 月), the code can be run like,
import re
str1 = "ZZZ。10月,AAA。11月2日,BBB。CCC。3日,DDD。EEE。12月,FFF"
for i, item in enumerate(re.split(r'(?<=。)',str1)):
if i == 0:
cache = item
else:
if re.match(r'(^.{0,2}日)|(^.{0,2}月)', item):
res.append(cache)
cache = item
else:
cache = item
res.append(cache)
print(res)
But I was wondering is there anything in this format:
re.match(r'(^.{0,2}日)|(^.{0,2}月)', item) and re.match(r'。$', item) can directly in one loop or some simple regex?
CodePudding user response:
You can use re.split with
(?<=。)(?=\s*\d{1,2}[日月])
See the regex demo. Details:
(?<=。)- match a location right after a dot(?=\s*\d{1,2}[日月])- that is immediately followed with zero or more whitespaces, then one or two digits and then a日or月.
See the Python demo:
import re
text = "ZZZ。10月,AAA。11月2日,BBB。CCC。3日,DDD。EEE。12月,FFF"
print( re.split(r'(?<=。)(?=\s*\d{1,2}[日月])', text) )
# => ['ZZZ。', '10月,AAA。', '11月2日,BBB。CCC。', '3日,DDD。EEE。', '12月,FFF']
