I'm trying to extract any strings with an underscore _ in the middle. For example, from the string s below:
s = "name1_name2 _ nothing test1 _ test 2_ _3"
I would like to extract name1_name2.
Thank you for reading!
CodePudding user response:
>>> import re
>>> s = 'name1_name2 _ nothing test1 _ test 2_ _3 name3_name4'
>>> re.findall('[a-zA-Z0-9] _[a-zA-Z0-9] ', s)
['name1_name2', 'name3_name4']
import re: this imports the regular expression module, which is part of the standard library. I suggest you get familiar with it, it's useful in many use cases.
re.findall: the findall method from the module re returns all non-overlapping matches of pattern in string, as a list of strings or tuples.
[a-zA-Z0-9] _[a-zA-Z0-9] : this regex means any a to z, lowercase or uppercase characters, followed by an underscore, followed by any a to z, lowercase or uppercase characters.
The regex \w _\w might have unintended consequences. Look at the differences below:
>>> s = 'name1_name2 _ nothing test1 _ test 2_ _3 name3_name4 ˆd_d'
>>> re.findall('[a-zA-Z0-9] _[a-zA-Z0-9] ', s)
['name1_name2', 'name3_name4', 'd_d']
>>> re.findall('\w _\w ', s)
['name1_name2', 'name3_name4', 'ˆd_d']
One might say that you can use \w passing the ASCII flag, which equals to [a-zA-Z0-9_] but as you can note, there's an underscore which also can have unintended consequences:
>>> s = 'name1_name2 _ nothing test1 _ test 2_ _3 name3_name4 ˆd_d_'
>>> re.findall('\w _\w ', s, re.ASCII)
['name1_name2', 'name3_name4', 'd_d_']
CodePudding user response:
How's this?
[\w] _[\w]
Does that work for you?
CodePudding user response:
You can use regex given below and it will select any string having single _ in its name.
^\S [_]\S
