I was experimenting with numpy when I noticed that:
x = np.arange(4).reshape(2,2)
print(x)
x[[1,0]]=x
print(x)
Outputs:
[[0 1]
[2 3]]
[[0 1]
[0 1]]
Similarly:
df = pd.DataFrame({'x':[0,2],'y':[1,3]})
print(df)
df.iloc[[1,0]]=df
print(df)
Outputs:
x y
0 0 1
1 2 3
x y
0 0 1
1 0 1
Why?
CodePudding user response:
numpy does use buffering for things like = (see np.add.at docs for ways around that). But for simple assignment like this is doesn't.
y[[1,0]] = y 0 is example of the buffering; the y 0 produces a new array, which is then assigned.
For what it's worth, assignment is implemented as y.__setattr__(idx, value). That method does not test that value is the same as y (or more generally a view). It's up to you, the user, to be cognizant of such issues.
A generalization of your example is to assign a view:
In [99]: y=np.arange(4).reshape(2,2)
In [100]: y[[1,0]] = y[::-1,::-1]
In [101]: y
Out[101]:
array([[1, 1],
[3, 3]])
y[::-1,::-1] is a new array, but a view, sharing the underlying data-buffer with y. So assignment of y ends up modifying this view incrementally.
Making a copy avoids that:
In [102]: y=np.arange(4).reshape(2,2)
In [103]: y[[1,0]] = y[::-1,::-1].copy()
In [104]: y
Out[104]:
array([[1, 0],
[3, 2]])
More often I've seen questions about a double assignment like this:
In [106]: y[0],y[1] = y[1],y[0]
In [107]: y
Out[107]:
array([[2, 3],
[2, 3]])
Again, y[1] is a view of y, so gets changed during the assignment.
The correct switching assignment is:
In [108]: y=np.arange(4).reshape(2,2)
In [109]: y[[0,1]] = y[[1,0]]
In [110]: y
Out[110]:
array([[2, 3],
[0, 1]])
People try [106] because list equivalent does work:
In [111]: y=np.arange(4).reshape(2,2)
In [112]: alist=y.tolist()
In [113]: alist[0],alist[1] = alist[1],alist[0]
In [114]: alist
Out[114]: [[2, 3], [0, 1]]
List users, on the other hand, are often bitten by replicated references, as in:
In [125]: alist = [[1,2]]*3
In [126]: alist
Out[126]: [[1, 2], [1, 2], [1, 2]]
In [127]: alist[0][1] = 4
In [128]: alist
Out[128]: [[1, 4], [1, 4], [1, 4]]
