Suppose the below dataframe. How does one apply a function (e.g. np.log) to both column e's below? I'm pretty sure one needs to use apply or map but never done this with a dataframe that has multiindex columns?
df = pd.DataFrame([[1,2,3,4], [5,6,7,8], [9,10,11,12], [13,14,15,16]])
df.columns = pd.MultiIndex.from_tuples((("a", "d"), ("a", "e"), ("b", "d"), ("b","e")))
df
a b
d e d e
0 1 2 3 4
1 5 6 7 8
2 9 10 11 12
3 13 14 15 16
Desired output:
df.loc[:, (slice(None), ['e'])] = df.loc[:, (slice(None), ['e'])].apply(np.log).round(2)
df
a b
d e d e
0 1 0.69 3 1.39
1 5 1.79 7 2.08
2 9 2.30 11 2.48
3 13 2.64 15 2.77
I realize I could use an inelegant method such as the above but would like to utilize functional programming techniques.
CodePudding user response:
You could use swaplevel to put the e column names at the top, making them directly accessible via indexing:
df = df.swaplevel(axis=1).assign(e=df.swaplevel(axis=1)['e'].apply(np.log).round(2)).swaplevel(axis=1)
Output:
>>> df
a b
d e d e
0 1 0.69 3 1.39
1 5 1.79 7 2.08
2 9 2.30 11 2.48
3 13 2.64 15 2.77
CodePudding user response:
Use loc and select on the columns axis:
# for this transform and apply are interchangeable
df.loc(axis=1)[:, 'e'] = df.loc(axis=1)[:, 'e'].transform(np.log).round(2)
df
a b
d e d e
0 1 0.69 3 1.39
1 5 1.79 7 2.08
2 9 2.30 11 2.48
3 13 2.64 15 2.77
