Home > Back-end >  How does numpy/pandas index self-assignment work?
How does numpy/pandas index self-assignment work?

Time:01-07

I was experimenting with numpy when I noticed that:

x = np.arange(4).reshape(2,2)
print(x)
x[[1,0]]=x
print(x)

Outputs:

[[0 1]
 [2 3]]
[[0 1]
 [0 1]]

Similarly:

df = pd.DataFrame({'x':[0,2],'y':[1,3]})
print(df)
df.iloc[[1,0]]=df
print(df)

Outputs:

   x  y
0  0  1
1  2  3
   x  y
0  0  1
1  0  1

Why?

CodePudding user response:

numpy does use buffering for things like = (see np.add.at docs for ways around that). But for simple assignment like this is doesn't.

y[[1,0]] = y 0 is example of the buffering; the y 0 produces a new array, which is then assigned.

For what it's worth, assignment is implemented as y.__setattr__(idx, value). That method does not test that value is the same as y (or more generally a view). It's up to you, the user, to be cognizant of such issues.

A generalization of your example is to assign a view:

In [99]: y=np.arange(4).reshape(2,2)
In [100]: y[[1,0]] = y[::-1,::-1]
In [101]: y
Out[101]: 
array([[1, 1],
       [3, 3]])

y[::-1,::-1] is a new array, but a view, sharing the underlying data-buffer with y. So assignment of y ends up modifying this view incrementally.

Making a copy avoids that:

In [102]: y=np.arange(4).reshape(2,2)
In [103]: y[[1,0]] = y[::-1,::-1].copy()
In [104]: y
Out[104]: 
array([[1, 0],
       [3, 2]])

More often I've seen questions about a double assignment like this:

In [106]: y[0],y[1] = y[1],y[0]
In [107]: y
Out[107]: 
array([[2, 3],
       [2, 3]])

Again, y[1] is a view of y, so gets changed during the assignment.

The correct switching assignment is:

In [108]: y=np.arange(4).reshape(2,2)
In [109]: y[[0,1]] = y[[1,0]]
In [110]: y
Out[110]: 
array([[2, 3],
       [0, 1]])

People try [106] because list equivalent does work:

In [111]: y=np.arange(4).reshape(2,2)
In [112]: alist=y.tolist()
In [113]: alist[0],alist[1] = alist[1],alist[0]
In [114]: alist
Out[114]: [[2, 3], [0, 1]]

List users, on the other hand, are often bitten by replicated references, as in:

In [125]: alist = [[1,2]]*3
In [126]: alist
Out[126]: [[1, 2], [1, 2], [1, 2]]
In [127]: alist[0][1] = 4
In [128]: alist
Out[128]: [[1, 4], [1, 4], [1, 4]]
  •  Tags:  
  • Related