Home > Enterprise >  Re-Assigning values on a pandas DataFrame best practice
Re-Assigning values on a pandas DataFrame best practice

Time:12-28

I am working on the Medical Data Visualizer challenge and I am reassigning DataFrame values based on some conditions, like this :

df['cholesterol'].loc[df['cholesterol'] == 1] = 0 #normalizing cholestrol values
df['cholesterol'].loc[df['cholesterol'] > 1] = 1 #normalizing cholestrol values

It seems to work, however , I am also working with Jupyter Notebooks and when I load the snipet with this code I get the following warning :

/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py:670: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

Now, even though it works, I am wondering if this is the best way to do it (best practice) or there is other way I should reassign those values.

Looking for the best practice on this subject to not cause future errors.

Thank you

EDIT :

id  age gender  height  weight  ap_hi   ap_lo   cholesterol gluc    smoke   alco    active  cardio
    0   18393   2   168 62.0    110 80  1   1   0   0   1   0
    1   20228   1   156 85.0    140 90  3   1   0   0   1   1
    2   18857   1   165 64.0    130 70  3   1   0   0   0   1
    3   17623   2   169 82.0    150 100 1   1   0   0   1   1
    4   17474   1   156 56.0    100 60  1   1   0   0   0   0

CodePudding user response:

Reference: Why does assignment fail when using chained indexing?

Use:

#        subset index          subset columns
df.loc[df['cholesterol'] == 1, 'cholesterol'] = 0
df.loc[df['cholesterol'] > 1, 'cholesterol'] = 1
print(df)

# Output
   cholesterol
0            0
1            1
2            1

This will avoid SettingWithCopyWarning because there is no chained subsetting.

Setup:

df = pd.DataFrame({'cholesterol': [1, 2, 3]})
print(df)

# Output
   cholesterol
0            1
1            2
2            3

Update

With your sample:

>>> df[['id', 'age', 'cholesterol']]
   id    age  cholesterol
0   0  18393            0
1   1  20228            1
2   2  18857            1
3   3  17623            0
4   4  17474            0

>>> df
   id    age  gender  height  weight  ap_hi  ap_lo  cholesterol  gluc  smoke  alco  active  cardio
0   0  18393       2     168    62.0    110     80            0     1      0     0       1       0
1   1  20228       1     156    85.0    140     90            1     1      0     0       1       1
2   2  18857       1     165    64.0    130     70            1     1      0     0       0       1
3   3  17623       2     169    82.0    150    100            0     1      0     0       1       1
4   4  17474       1     156    56.0    100     60            0     1      0     0       0       0
  • Related