What are the pros/cons in using pd.Index vs df.loc-CodePudding

What is the difference between using pd.Index vs df.loc? Is it effectively the same thing?

idx = pd.Index(('a', 'b'))
df = pd.DataFrame({'a': [0, 1], 'b': [2, 3], 'c': [0, 5]})

print(df.loc[:, ('a', 'b')],)
print(df[idx])

   a  b
0  0  2
1  1  3

CodePudding user response：

When you do loc , you can do with index slice and columns slice or combine, however pd.index can only do for column slice

df.loc[[0]]
   a  b  c
0  0  2  0

df.loc[[0],['a','b']]
   a  b
0  0  2

IMO, loc is more flexible to using, and I will select loc which will more clear for the long run or check back stage.

CodePudding user response：

How loc is the preferred method is described in the documentation. Using multiple slices can lead to a SettingWithCopyWarning:

idx = ['a', 'b']
d = df[idx]
d.iloc[0,0] = 9

SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

In contrast, using loc doesn't trigger the SettingWithCopyWarning:

idx = ['a', 'b']
d = df.loc[:,idx]
d.iloc[0,0] = 9

Of note, loc also enables you to pass a specific axis as parameter:

df.loc(axis=1)[idx]