Here is my df:
personUID LR_Value_y diagnosis_y
0 abc1 10 10 ICD10_R99 ICD10_R98
1 abc5 200 ICD10_R99
2 abc1 10 10 ICD10_R99 ICD10_R98
3 abc2 15 ICD10_R98
4 abc3 14 ICD10_R97
5 abc4 100 ICD10_R97
How can I add those to "10 10" values to get 20?
CodePudding user response:
For each row, you can split the string on white space, convert each number from string literal to integer, and add them.
One way of doing the above using list comprehension:
df['LR_Value_y'] = [sum(int(x) for x in string.split()) for string in df['LR_Value_y']]
Another way by using str.split and explode methods:
df['LR_Value_y'] = df['LR_Value_y'].str.split().explode().astype(int).groupby(level=0).sum()
Output:
personUID LR_Value_y diagnosis_y
0 abc1 20 ICD10 R99 ICD10 R98
1 abc5 200 ICD10 R99
2 abc1 20 ICD10 R99 ICD10 R98
3 abc2 15 ICD10 R98
4 abc3 14 ICD10 R97
5 abc4 100 ICD10 R97
Note that this changed the dtype of LR_Value_y column. If you need each element as type str, you can do that too using astype(str) (but I don't think you really want that).
CodePudding user response:
You can replace whitespace by and eval the expression:
df['LR_Value_y'] = pd.eval(df['LR_Value_y'].str.replace(r'\s ', ' '))
print(df)
# Output
personUID LR_Value_y diagnosis_y
0 abc1 20 ICD10_R99 ICD10_R98
1 abc5 200 ICD10_R99
2 abc1 20 ICD10_R99 ICD10_R98
3 abc2 15 ICD10_R98
4 abc3 14 ICD10_R97
5 abc4 100 ICD10_R97
