Home > Enterprise >  Functional programming question: How to apply a function to the same column in a dataframe that has
Functional programming question: How to apply a function to the same column in a dataframe that has

Time:01-28

Suppose the below dataframe. How does one apply a function (e.g. np.log) to both column e's below? I'm pretty sure one needs to use apply or map but never done this with a dataframe that has multiindex columns?

df = pd.DataFrame([[1,2,3,4], [5,6,7,8], [9,10,11,12], [13,14,15,16]])
df.columns = pd.MultiIndex.from_tuples((("a", "d"), ("a", "e"), ("b", "d"), ("b","e")))
df  

    a       b
    d   e   d   e
0   1   2   3   4
1   5   6   7   8
2   9  10  11  12
3  13  14  15  16

Desired output:

df.loc[:, (slice(None), ['e'])] = df.loc[:, (slice(None), ['e'])].apply(np.log).round(2)

df
    a         b
    d     e   d     e
0   1  0.69   3  1.39
1   5  1.79   7  2.08
2   9  2.30  11  2.48
3  13  2.64  15  2.77

I realize I could use an inelegant method such as the above but would like to utilize functional programming techniques.

CodePudding user response:

You could use swaplevel to put the e column names at the top, making them directly accessible via indexing:

df = df.swaplevel(axis=1).assign(e=df.swaplevel(axis=1)['e'].apply(np.log).round(2)).swaplevel(axis=1)

Output:

>>> df
    a         b      
    d     e   d     e
0   1  0.69   3  1.39
1   5  1.79   7  2.08
2   9  2.30  11  2.48
3  13  2.64  15  2.77

CodePudding user response:

Use loc and select on the columns axis:

# for this transform and apply are interchangeable
df.loc(axis=1)[:, 'e'] = df.loc(axis=1)[:, 'e'].transform(np.log).round(2)

df
    a         b
    d     e   d     e
0   1  0.69   3  1.39
1   5  1.79   7  2.08
2   9  2.30  11  2.48
3  13  2.64  15  2.77
  •  Tags:  
  • Related