Select all the subcolumns with a given name from pandas dataframe-CodePudding

I have a pandas dataframe df that I built using 3 levels of columns, as follows:

     a1            a2            a3
     b1     b2     b1     b3     b1     b4
     c1 c2  c1 c2  c1 c2  c1 c2  c1 c2  c1 c2
     ... (data) ...

Note that each a column may have different b subcolumns, but each b column has the same c subcolumns.

I can extract e.g. the subcolumns from a2 using df["a2"].

How can I select based on the second or third level without having to specify the first and second level respectively? For instance I would like to say "give me all the c2 columns you can find" and I would get:

     a1     a2     a3
     b1 b2  b1 b3  b1 b4
     ... (data for the c2 columns) ...

Or "give me all the b1 columns" and I would get:

     a1     a2     a3
     c1 c2  c1 c2  c1 c2 
     ... (data for the b1 columns) ...

CodePudding user response：

The docs provide some info on that. Adapting the examples from there to your example, either use tuples with slice objects you pass None,

df.loc[:, (slice(None), slice(None), "c2")]

or use pd.IndexSliceto use the familiar : notation:

idx = pd.IndexSlice
df.loc[:, idx[:, :, "c2"]]

CodePudding user response：

When you have a dataframe with a MultiIndex the columns are a list of tuples containing the keys of each level.

You can get the columns you want with a list comprehension checking if the name you want is in the tuple:

# get all the "c2"
df[[col for col in df.columns if "c2" in col]]
# get all the "b1"
df[[col for col in df.columns if "b1" in col]]

If you want to be sure that the column name is in the right position:

# get all the "c2"
df[[col for col in df.columns if col[2] == "c2"]]
# get all the "b1"
df[[col for col in df.columns if col[1] == "b1"]]