How do you slice a cross section in pandas or numpy?-CodePudding

I have the following data frame which can be copy/pasted and made to a data frame with: df = pd.read_clipboard()

    0   1   2   3   4   5   6   7   8   9   10  11  12  13  14  15
0    0   1   5  12  13   7   1   5   5   1   7  13  12   5   1   0
1    1   0   4  13  12   6   0   4   4   0   6  12  13   4   0   1
2    5   4   0   9   8   2   4   0   0   4   2   8   9   0   4   5
3   12  13   9   0   1  11  13   9   9  13  11   1   0   9  13  12
4   13  12   8   1   0  10  12   8   8  12  10   0   1   8  12  13
5    7   6   2  11  10   0   6   2   2   6   0  10  11   2   6   7
6    1   0   4  13  12   6   0   4   4   0   6  12  13   4   0   1
7    5   4   0   9   8   2   4   0   0   4   2   8   9   0   4   5
8    5   4   0   9   8   2   4   0   0   4   2   8   9   0   4   5
9    1   0   4  13  12   6   0   4   4   0   6  12  13   4   0   1
10   7   6   2  11  10   0   6   2   2   6   0  10  11   2   6   7
11  13  12   8   1   0  10  12   8   8  12  10   0   1   8  12  13
12  12  13   9   0   1  11  13   9   9  13  11   1   0   9  13  12
13   5   4   0   9   8   2   4   0   0   4   2   8   9   0   4   5
14   1   0   4  13  12   6   0   4   4   0   6  12  13   4   0   1
15   0   1   5  12  13   7   1   5   5   1   7  13  12   5   1   0

I would like to take a cross section from it, I want something like say:

[1, 4, 9, 1, 10, 6, 4, 0, 4, 6, 10, 1, 9, 4, 1])

which is index df.loc[1, 0], df.loc[2, 1], df.loc[3, 2], df.loc[4, 3], etc.

Is there a numpy or pandas pattern to do get this type of cross slice easier than with many different indexes like which I'm doing? Thanks.

CodePudding user response：

We can use np.diagonal with offset=1 to select the diagonal elements above the main diagonal

np.diagonal(df, offset=1)

array([ 1,  4,  9,  1, 10,  6,  4,  0,  4,  6, 10,  1,  9,  4,  1])

CodePudding user response：

If you have a numpy array, you can actually get a slice. The difference between a slice and an advanced indexing expression is that a slice returns a view into the original data, while an advanced index always makes a copy. If the array is C-contiguous, you can use ravel to get a view:

arr = df.to_numpy()

row = 1
col = 0
n = 4
view = arr.ravel()[row * arr.shape[1]   col:(row   n - 1) * arr.shape[1]   col   n:arr.shape[1]   1]

If you don't have a contiguous array, it takes a little bit more work, since you need to set the strides of the view manually. You can use np.lib.stride_tricks.as_strided:

view = np.lib.stride_tricks.as_strided(arr[row:, col:], shape=n, strides=arr.strides[0]   arr.strides[1])

This should be identical to the much simpler method presented in the accepted answer.

CodePudding user response：

I use numpy to solve this.

import numpy as np
import pandas as pd

df = pd.DataFrame(np.arange(16 * 16).reshape(16, 16))
print(df)

print(df.to_numpy()[range(15), range(1, 16)])

CodePudding user response：

You can use numpy advanced indexing:

For example, if you want to select

df.loc[1, 0], df.loc[2, 1], df.loc[3, 2], df.loc[4, 3]

first convert the df to numpy array, then index the elements using the appropriate row and column indices:

df_to_arr = df.to_numpy()
out = df_to_arr[[1,2,3,4], [0,1,2,3]]

Output:

array([1, 4, 9, 1], dtype=int64)