Pandas column contains a series of urls. I'd like to extract a substring from the url.
MRE code below.
s = pd.Series(['https://url-location/img/xxxyyy_image1.png'])
s.apply(lambda x: x[x.find("/") 1:st.find("_")])
I'd like to extract xxxyyy and store them into a new column.
CodePudding user response:
You can use
>>> s.str.extract(r'.*/([^_] )')
0
0 xxxyyy
See the regex demo. Details:
.*- zero or more chars other than line break chars as many as possible/- a slash([^_] )- Capturing group 1 (the value captured into this group will be the actual return value ofSeries.str.extract): one or more chars other than_char.
CodePudding user response:
Also possible:
s.str.split('/').str[-1].str.split('_').str[0]
# Out[224]: xxxyyy
This works, because .str allows for the slice annotation.
So .str[-1] will provide the last element after the split for example.
