The text is like "1-2years. 3years. 10years."
I want get result [(1,2),(3),(10)].
I use python.
I first tried r"([0-9]?)[-]?([0-9])years". It works well except for the case of 10. I also tried r"([0-9]?)[-]?([0-9]|10)years" but the result is still [(1,2),(3),(1,0)].
CodePudding user response:
This should work:
import re
st = '1-2years. 3years. 10years.'
result = [tuple(e for e in tup if e)
for tup in re.findall(r'(?:(\d )-(\d )|(\d ))years', st)]
# [('1', '2'), ('3',), ('10',)]
The regex will look for either one number, or two separated by a hyphen, immediately prior to the word years. If we give this to re.findall(), it will give us the output [('1', '2', ''), ('', '', '3'), ('', '', '10')], so we also use a quick list comprehension to filter out the empty strings.
Alternately we could use r'(\d )(?:-(\d ))?years' to basically the same effect, which is closer to what you've already tried.
CodePudding user response:
You can use this pattern: (?:(\d )-)?(\d )years
See Regex Demo
Code:
import re
pattern = r"(?:(\d )-)?(\d )years"
text = "1-2years. 3years. 10years."
print([tuple(int(z) for z in x if z) for x in re.findall(pattern, text)])
Output:
[(1, 2), (3,), (10,)]
CodePudding user response:
Your attempt r"([0-9]?)[-]?([0-9])years" doesn't work for the case of 10 because you ask it to match one (or zero) digit per group.
You also don't need the hyphen in brackets.
This should work: Regex101
(\d )(?:-(\d ))?years
Explanation:
(\d ): Capturing group for one or more digits(?: ): Non-capturing group-: hyphen(\d ): Capturing group for one or more digits(?: )?: Make the previous non-capturing group optional
In python:
import re
result = re.findall(r"(\d )(?:-(\d ))?years", "1-2years. 3years. 10years.")
# Gives: [('1', '2'), ('3', ''), ('10', '')]
Each tuple in the list contains two elements: The number on the left side of the hyphen, and the number on the right side of the hyphen. Removing the blank elements is quite easy: you loop over each item in result, then you loop over each match in this item and only select it (and convert it to int) if it is not empty.
final_result = [tuple(int(match) for match in item if match) for item in result]
# gives: [(1, 2), (3,), (10,)]
CodePudding user response:
You only match a single digit as the character class [0-9] is not repeated.
Another option is to match the first digits with an optional part for - and digits.
\b(\d (?:-\d )?)years\.
\bA word boundary(Capture group 1 (which will be returned by re.findall)\d (?:-\d )?Match 1 digits and optionally match-and again 1 digits
)Close group 1years\.Match literally with the escaped.
Then you can split the matches on -
pattern = r"\b(\d (?:-\d )?)years\."
s = "1-2years. 3years. 10years."
res = [tuple(v.split('-')) for v in re.findall(pattern, s)]
print(res)
Output
[('1', '2'), ('3',), ('10',)]
Or if a list of lists is also ok instead of tuples
res = [v.split('-') for v in re.findall(pattern, s)]
Output
[['1', '2'], ['3'], ['10']]
