My dataframe looks somthing like this
frame = pd.DataFrame({'id':[1,2,3,4,5],
'week1_values':[0,0,13,39,64],
'week2_values':[32,35,25,78,200]})
I am trying to apply a function to calculate the Week over Week percentage difference between two columns('week1_values' and 'week2_values') whose names are being generated dynamically.
I want to create a function to calculate the percentage difference between weeks keeping in mind the zero values in the 'week1_values' column.
My function is something like this:
def WoW(df):
if df.iloc[:,1] == 0:
return (df.iloc[:,1] - df.iloc[:,2])
else:
return ((df.iloc[:,1] - df.iloc[:,2]) / df.iloc[:,1]) *100
frame['WoW%'] = frame.apply(WoW,axis=1)
When i try to do that, i end up with this error
IndexingError: ('Too many indexers', 'occurred at index 0')
How is it that one is supposed to specify columns by their positions inside a function?
PS: Just want to clarify that since the column names are being generated dynamically, i am trying to select them by their position with iloc function.
CodePudding user response:
Because working with Series, remove indexing columns:
def WoW(df):
if df.iloc[1] == 0:
return (df.iloc[1] - df.iloc[2])
else:
return ((df.iloc[1] - df.iloc[2]) / df.iloc[1]) *100
frame['WoW%'] = frame.apply(WoW,axis=1)
Vectorized alternative:
s = frame.iloc[:,1] - frame.iloc[:,2]
frame['WoW%1'] = np.where(frame.iloc[:, 1] == 0, s, (s / frame.iloc[:,1]) *100)
print (frame)
id week1_values week2_values WoW% WoW%1
0 1 0 32 -32.000000 -32.000000
1 2 0 35 -35.000000 -35.000000
2 3 13 25 -92.307692 -92.307692
3 4 39 78 -100.000000 -100.000000
4 5 64 200 -212.500000 -212.500000
CodePudding user response:
You can use pandas pct_change method to automatically compute the percent change.
s = (frame.iloc[:, 1:].pct_change(axis=1).iloc[:, -1]*100)
frame['WoW%'] = s.mask(np.isinf(s), frame.iloc[:, -1])
output:
id week1_values week2_values WoW
0 1 0 32 32.000000
1 2 0 35 35.000000
2 3 13 25 92.307692
3 4 39 78 100.000000
4 5 64 200 212.500000
Note however that the way you currently do it in your custom function is biased. Changes from 0->20, or 10->12, or 100->120 would all produce 20 as output, which seems ambiguous.
suggested alternative
use a classical percent increase, even if it leads to infinite:
frame['WoW'] = frame.iloc[:, 1:].pct_change(axis=1).iloc[:, -1]*100
output:
id week1_values week2_values WoW
0 1 0 32 inf
1 2 0 35 inf
2 3 13 25 92.307692
3 4 39 78 100.000000
4 5 64 200 212.500000
