Home > Enterprise >  Selecting columns by their position while using a function in pandas
Selecting columns by their position while using a function in pandas

Time:01-25

My dataframe looks somthing like this

frame = pd.DataFrame({'id':[1,2,3,4,5],
                         'week1_values':[0,0,13,39,64],
                         'week2_values':[32,35,25,78,200]})

I am trying to apply a function to calculate the Week over Week percentage difference between two columns('week1_values' and 'week2_values') whose names are being generated dynamically.

I want to create a function to calculate the percentage difference between weeks keeping in mind the zero values in the 'week1_values' column.

My function is something like this:

def WoW(df):
     if df.iloc[:,1] == 0:
            return (df.iloc[:,1] - df.iloc[:,2])  
        else:
            return ((df.iloc[:,1] - df.iloc[:,2]) / df.iloc[:,1]) *100
 frame['WoW%'] = frame.apply(WoW,axis=1)

When i try to do that, i end up with this error

IndexingError: ('Too many indexers', 'occurred at index 0')

How is it that one is supposed to specify columns by their positions inside a function?

PS: Just want to clarify that since the column names are being generated dynamically, i am trying to select them by their position with iloc function.

CodePudding user response:

Because working with Series, remove indexing columns:

def WoW(df):
     if df.iloc[1] == 0:
            return (df.iloc[1] - df.iloc[2])  
     else:
            return ((df.iloc[1] - df.iloc[2]) / df.iloc[1]) *100
frame['WoW%'] = frame.apply(WoW,axis=1)

Vectorized alternative:

s = frame.iloc[:,1] - frame.iloc[:,2]
frame['WoW%1'] = np.where(frame.iloc[:, 1] == 0, s, (s / frame.iloc[:,1]) *100)
print (frame)
   id  week1_values  week2_values        WoW%       WoW%1
0   1             0            32  -32.000000  -32.000000
1   2             0            35  -35.000000  -35.000000
2   3            13            25  -92.307692  -92.307692
3   4            39            78 -100.000000 -100.000000
4   5            64           200 -212.500000 -212.500000

CodePudding user response:

You can use pandas pct_change method to automatically compute the percent change.

s = (frame.iloc[:, 1:].pct_change(axis=1).iloc[:, -1]*100)
frame['WoW%'] = s.mask(np.isinf(s), frame.iloc[:, -1])

output:

   id  week1_values  week2_values         WoW
0   1             0            32   32.000000
1   2             0            35   35.000000
2   3            13            25   92.307692
3   4            39            78  100.000000
4   5            64           200  212.500000

Note however that the way you currently do it in your custom function is biased. Changes from 0->20, or 10->12, or 100->120 would all produce 20 as output, which seems ambiguous.

suggested alternative

use a classical percent increase, even if it leads to infinite:

frame['WoW'] = frame.iloc[:, 1:].pct_change(axis=1).iloc[:, -1]*100

output:

   id  week1_values  week2_values         WoW
0   1             0            32         inf
1   2             0            35         inf
2   3            13            25   92.307692
3   4            39            78  100.000000
4   5            64           200  212.500000
  •  Tags:  
  • Related