Trying with python to find all strings inside a double quote, and with domain name like format, such as "abc.def.ghi".
I am currently using re.findall('\"([a-z\\.] [a-z]*)\"', input_string),
[a-z\\.] is for abc., def. and [a-z]* is for ghi.
So far it has no issue to match all string like "abc.def.ghi", but it also matches string that contains no ., such as "opq", "rst".
Question is, how to get rid of those string contains no dot . using regx?
CodePudding user response:
Pattern
'"([a-z] (?:\.[a-z] ) )"'
Explanation
- Start & end with a double quote
- capture group
- [a-z] one letter a-z
- (?:...) nested non-capturomg subgroup of the capture group
- period followed by at least one letter a-z (repeated at least once)
- the nested subgroup is repeated at least once
- make subgroup non-capturing since otherwise findall will only report this subgroup
Usage
pattern = re.compile(r'\"[a-z] (?:\.[a-z] ) \"')
tests = ['"abc.def.ghi"', '"opq"']
for input_string in tests:
print(f"input_string: {input_string}, findall: {pattern.findall(input_string)}")
Output
input_string: "abc.def.ghi", found: ['abc.def.ghi']
input_string: "opq", found: []
CodePudding user response:
[a-z\\.]
this part. matches any character a-z or . if you want the dot to be there, you will have to move it outside the character set something like
([a-z] \\.)
result: visualization
