Home > Blockchain >  Find keyword in snake_case texts
Find keyword in snake_case texts

Time:02-03

I have a problem where I need to find if a word(for example "output") is present in a snake_case word.

Which means the regex must be capable of matching all of the following situations-

  • output
  • output_of_my_program
  • my_output_from_program
  • program_output

i.e. output, output_, output, and _output need to be matched

currently I have three individual regex patterns to cover all the cases, which are-

  • "^output[^a-z]_?"
  • "_output_"
  • "_output$"

I however have tried to combine the three into one, which is [a-z]*_?[^a-z]output_?[a-z]* but this fails for certain cases. Is it possible to combine the three patterns into one in this case?

Edit:The other keywords I am interested in are "in", "input", "out" and the challenge is to avoid matches with words such as "introspection" and other cases such as "my_programoutput"

CodePudding user response:

You don't include it in your test cases, but I assume part of the intent is to explicitly not match a string like my_floutputs_from_program. You could do this with a tricky regex, but I'd just use split and in:

for s in (
    'output', 
    'output_of_my_program', 
    'my_output_from_program', 
    'program_output', 
    'input', 
    'my_floutputs_from_program'
):
    print(f"{s}: {'output' in s.split('_')}")
output: True
output_of_my_program: True
my_output_from_program: True
program_output: True
input: False
my_floutputs_from_program: False

CodePudding user response:

Is there a reason why 'output' in s would not work? You do not have to use regex to solve this, unless it is necessitated as part of the project.

>>> strings = [
...     'output',
...     'output_of_my_program',
...     'my_output_from_program',
...     'program_output',
...     'input'
... ]
>>>
>>> for s in strings:
...     print('output' in s)
...
True
True
True
True
False

CodePudding user response:

If you can use the re library, you can just use the search function with the string required to find.

import re

txt = "my_output_from_program"
result = re.search("output", txt)

CodePudding user response:

You can use a pattern with either a word boundary, or a postive assertion for _ on the left or the right if you don't want partial matches.

(?:\b|(?<=_))output(?:\b|(?=_))

The pattern matches:

  • (?:\b|(?<=_)) Match either a word boundary or assert _ to the left
  • output Match literally
  • (?:\b|(?=_)) Match either a word boundary, or assert _ to the right

See a regex demo and a Python demo.

import re

pattern = r"(?:\b|(?<=_))output(?:\b|(?=_))"
strings = [
    "output",
    "output_of_my_program",
    "my_output_from_program",
    "program_output",
    "my_programoutput"
]

for s in strings:
    m = re.search(pattern, s)
    if m:
        print(f"{s} --> {m.group()}")

Output

output --> output
output_of_my_program --> output
my_output_from_program --> output
program_output --> output
  •  Tags:  
  • Related