Home > database >  Python: Only keep section of string after second dash and before third dash
Python: Only keep section of string after second dash and before third dash

Time:01-12

I have a column 'Id' that has data like this:

'10020-100-700-800-2'

How can I create a new column that would only contain the third number, in this case 700, for each row?

Here is an example dataframe:

d = {'id': {0: '10023_11_762553_762552_11', 1: '10023_14_325341_359865_14', 2: '10023_17_771459_771453_17', 3: '10023_20_440709_359899_20', 4: '10023_24_773107_625033_24', 5: '10023_27_771462_771463_27', 6: '10023_30_771262_771465_30', 7: '10023_33_761971_762470_33'}, 'values': {0: 10023, 1: 10023, 2: 10023, 3: 10023, 4: 10023, 5: 10023, 6: 10023, 7: 10023}}

CodePudding user response:

Use str.split and take the third argument of the list:

df = pd.DataFrame({'Col': ['10020-100-700-800-2']})
df['NewCol'] = df['Col'].str.split('-').str[2].astype(int)
print(df)

# Output
                   Col  NewCol
0  10020-100-700-800-2     700

Update with your sample:

data = {'Id': ['10020-100-700-800-2',
       '10022-400-900-900-2',
       '10045-600-800-900-3',
       '10000-300-400-300-3',
       '10020-200-200-200-2'],
      'Employed': [1, 0, 0, 1, 1],
      'Name': ['Alan', 'Joe', 'Sam', 'Amy', 'Chloe']}
df = pd.DataFrame(data)

df['Id2'] = df['Id'].str.split('-').str[2].astype(int)
print(df)

# Output
                    Id  Employed   Name  Id2
0  10020-100-700-800-2         1   Alan  700
1  10022-400-900-900-2         0    Joe  900
2  10045-600-800-900-3         0    Sam  800
3  10000-300-400-300-3         1    Amy  400
4  10020-200-200-200-2         1  Chloe  200

Update 2 with your new data

data = {'id': ['10023_11_762553_762552_11',
               '10023_14_325341_359865_14',
               '10023_17_771459_771453_17',
               '10023_20_440709_359899_20',
               '10023_24_773107_625033_24',
               '10023_27_771462_771463_27',
               '10023_30_771262_771465_30',
               '10023_33_761971_762470_33'],
        'values': [10023, 10023, 10023, 10023, 10023, 10023, 10023, 10023]}
df = pd.DataFrame(data)

df['id2'] = df['id'].str.split('_').str[2].astype(int)
print(df)

# Output
                          id  values     id2
0  10023_11_762553_762552_11   10023  762553
1  10023_14_325341_359865_14   10023  325341
2  10023_17_771459_771453_17   10023  771459
3  10023_20_440709_359899_20   10023  440709
4  10023_24_773107_625033_24   10023  773107
5  10023_27_771462_771463_27   10023  771462
6  10023_30_771262_771465_30   10023  771262
7  10023_33_761971_762470_33   10023  761971
  •  Tags:  
  • Related