Home > Mobile >  How to sum multiple variables within the same row under the same column in Python?
How to sum multiple variables within the same row under the same column in Python?

Time:01-13

Here is my df:

  personUID LR_Value_y          diagnosis_y
0      abc1      10 10  ICD10_R99 ICD10_R98
1      abc5        200            ICD10_R99
2      abc1      10 10  ICD10_R99 ICD10_R98
3      abc2         15            ICD10_R98
4      abc3         14            ICD10_R97
5      abc4        100            ICD10_R97

How can I add those to "10 10" values to get 20?

CodePudding user response:

For each row, you can split the string on white space, convert each number from string literal to integer, and add them.

One way of doing the above using list comprehension:

df['LR_Value_y'] = [sum(int(x) for x in string.split()) for string in df['LR_Value_y']]

Another way by using str.split and explode methods:

df['LR_Value_y'] = df['LR_Value_y'].str.split().explode().astype(int).groupby(level=0).sum()

Output:

  personUID  LR_Value_y          diagnosis_y
0      abc1          20  ICD10 R99 ICD10 R98
1      abc5         200            ICD10 R99
2      abc1          20  ICD10 R99 ICD10 R98
3      abc2          15            ICD10 R98
4      abc3          14            ICD10 R97
5      abc4         100            ICD10 R97

Note that this changed the dtype of LR_Value_y column. If you need each element as type str, you can do that too using astype(str) (but I don't think you really want that).

CodePudding user response:

You can replace whitespace by and eval the expression:

df['LR_Value_y'] = pd.eval(df['LR_Value_y'].str.replace(r'\s ', ' '))
print(df)

# Output
  personUID  LR_Value_y          diagnosis_y
0      abc1          20  ICD10_R99 ICD10_R98
1      abc5         200            ICD10_R99
2      abc1          20  ICD10_R99 ICD10_R98
3      abc2          15            ICD10_R98
4      abc3          14            ICD10_R97
5      abc4         100            ICD10_R97
  •  Tags:  
  • Related