Home > OS >  Pandas dataframe and assigning values
Pandas dataframe and assigning values

Time:02-11

There is a behavior of pandas dataframes that I can't explain. I wish somone could walk me through this.

import pandas as pd
df = pd.DataFrame(np.array([[1, 5, 10]]), columns=["Jan", "Fév", "Mar"])
df2 = pd.DataFrame(np.array([[4, 4, 4]]), columns=["Jan", "Fév", "Mar"])

df
    Jan Fév Mar
0   1   5   10

df2
    Jan Fév Mar
0   4   4   4

So the booleans df < df2 and df >= df2 are respectively:

df < df2
    Jan     Fév     Mar
0   True    False   False

df >= df2
    Jan     Fév     Mar
0   False   True    True

However if I do this sequence of code:

df3 = df2
df3[df < df2] = 0
df3[df >= df2] = 7

I will get as a result:

df3
    Jan Fév Mar
0   7   7   7

df2
    Jan Fév Mar
0   7   7   7

My question is: Why do my code also modifies the values of df2?

Is it because of the df3 = df2?

CodePudding user response:

In pandas there is difference between views and copies, by using = you are creating view, changes applied to it are also applied to original, as opposed to copy. Consider following simple example

import pandas as pd
df1 = pd.DataFrame({'x':[1,2,3]})
df2 = df1
df3 = df1.copy()
df3['x'] = 0
print(df1)

output

   x
0  1
1  2
2  3

then

df2['x'] = 0
print(df1)

gives output

0    0
1    0
2    0

If you want to know more read Views and Copies in pandas in Practical Data Science.

Note that built-in python collections also do behave this way, e.g. dicts:

d1 = dict(x=1,y=2)
d2 = d1
d3 = d1.copy()
d3['x'] = 0
print(d1)  # {'x': 1, 'y': 2}
d2['x'] = 0
print(d1)  # {'x': 0, 'y': 2}
  • Related