Home > OS >  How do you slice a cross section in pandas or numpy?
How do you slice a cross section in pandas or numpy?

Time:01-29

I have the following data frame which can be copy/pasted and made to a data frame with: df = pd.read_clipboard()

    0   1   2   3   4   5   6   7   8   9   10  11  12  13  14  15
0    0   1   5  12  13   7   1   5   5   1   7  13  12   5   1   0
1    1   0   4  13  12   6   0   4   4   0   6  12  13   4   0   1
2    5   4   0   9   8   2   4   0   0   4   2   8   9   0   4   5
3   12  13   9   0   1  11  13   9   9  13  11   1   0   9  13  12
4   13  12   8   1   0  10  12   8   8  12  10   0   1   8  12  13
5    7   6   2  11  10   0   6   2   2   6   0  10  11   2   6   7
6    1   0   4  13  12   6   0   4   4   0   6  12  13   4   0   1
7    5   4   0   9   8   2   4   0   0   4   2   8   9   0   4   5
8    5   4   0   9   8   2   4   0   0   4   2   8   9   0   4   5
9    1   0   4  13  12   6   0   4   4   0   6  12  13   4   0   1
10   7   6   2  11  10   0   6   2   2   6   0  10  11   2   6   7
11  13  12   8   1   0  10  12   8   8  12  10   0   1   8  12  13
12  12  13   9   0   1  11  13   9   9  13  11   1   0   9  13  12
13   5   4   0   9   8   2   4   0   0   4   2   8   9   0   4   5
14   1   0   4  13  12   6   0   4   4   0   6  12  13   4   0   1
15   0   1   5  12  13   7   1   5   5   1   7  13  12   5   1   0

I would like to take a cross section from it, I want something like say:

[1, 4, 9, 1, 10, 6, 4, 0, 4, 6, 10, 1, 9, 4, 1])

which is index df.loc[1, 0], df.loc[2, 1], df.loc[3, 2], df.loc[4, 3], etc.

Is there a numpy or pandas pattern to do get this type of cross slice easier than with many different indexes like which I'm doing? Thanks.

CodePudding user response:

We can use np.diagonal with offset=1 to select the diagonal elements above the main diagonal

np.diagonal(df, offset=1)

array([ 1,  4,  9,  1, 10,  6,  4,  0,  4,  6, 10,  1,  9,  4,  1])

CodePudding user response:

If you have a numpy array, you can actually get a slice. The difference between a slice and an advanced indexing expression is that a slice returns a view into the original data, while an advanced index always makes a copy. If the array is C-contiguous, you can use ravel to get a view:

arr = df.to_numpy()

row = 1
col = 0
n = 4
view = arr.ravel()[row * arr.shape[1]   col:(row   n - 1) * arr.shape[1]   col   n:arr.shape[1]   1]

If you don't have a contiguous array, it takes a little bit more work, since you need to set the strides of the view manually. You can use np.lib.stride_tricks.as_strided:

view = np.lib.stride_tricks.as_strided(arr[row:, col:], shape=n, strides=arr.strides[0]   arr.strides[1])

This should be identical to the much simpler method presented in the accepted answer.

CodePudding user response:

I use numpy to solve this.

import numpy as np
import pandas as pd

df = pd.DataFrame(np.arange(16 * 16).reshape(16, 16))
print(df)

print(df.to_numpy()[range(15), range(1, 16)])

CodePudding user response:

You can use numpy advanced indexing:

For example, if you want to select

df.loc[1, 0], df.loc[2, 1], df.loc[3, 2], df.loc[4, 3]

first convert the df to numpy array, then index the elements using the appropriate row and column indices:

df_to_arr = df.to_numpy()
out = df_to_arr[[1,2,3,4], [0,1,2,3]]

Output:

array([1, 4, 9, 1], dtype=int64)
  •  Tags:  
  • Related