Home > Software design >  Extracting last occurrence of a certain string from each row in a pandas series
Extracting last occurrence of a certain string from each row in a pandas series

Time:01-28

from each row i want to extract the last occurrence of the word "user" the number that follow right after it from a pandas series. everything else can be discarded. how would you perform this action? thanks!!!

here's an example of the series :

0                         1 - Unassigned, 2 - User 397335
1         1 - Unassigned, 2 - User 525767, 3 - Unassigned
2                                          1 - Unassigned
3                                          1 - Unassigned
4                                          1 - Unassigned
                               ...                       
163678                                     1 - Unassigned
163679    1 - Unassigned, 2 - User 347991, 3 - Unassigned
163680                                     1 - Unassigned
163681                                     1 - Unassigned
163682    1 - Unassigned, 2 - User 663455, 3 - Unassigned

CodePudding user response:

Use str.findall:

>>> df['A'].str.findall(r'User \d ').str[-1]

0         User 397335
1         User 525767
2                 NaN
3                 NaN
4                 NaN
163678            NaN
163679    User 347991
163680            NaN
163681            NaN
163682    User 663455
Name: A, dtype: object
  •  Tags:  
  • Related