So, given a dataframe as follows:
Item Jan_20 Apr_20 Aug_20 Oct_20
Apple 3 4 3 4
Orange 5 6 1 2
I need to identify change in value (increase/decrease) on each product, over the four quarters, and aggregate these fluctuations for each quarter.
- If the value goes up one quarter compared to the previous quarter, I need to add the increase to an aggregated value
Upfor that quarter. - If it goes down, I need to record it as
Downfor that quarter.
Desired output:
Type Jan_20 Apr_20 Aug_20 Oct_20
Up 0 2 0 2
Down 0 0 6 0
How could I achieve this?
CodePudding user response:
Try this:
diff = df.set_index('Item').diff(axis=1).sum(axis=0)
new_df = (pd.DataFrame([
np.where(diff.ge(0), diff, 0),
np.where(diff.lt(0), abs(diff), 0)
],
index=['Up', 'Down'],
columns=diff.index
)
.astype(int))
Output:
>>> new_df
Jan_20 Apr_20 Aug_20 Oct_20
Up 0 2 0 2
Down 0 0 6 0
CodePudding user response:
To get differences between columns, use pandas .diff(axis=1):
dd = df.diff(axis=1).fillna(0).astype(int)
Jan_20 Apr_20 Aug_20 Oct_20
Item
Apple 0 1 -1 1
Orange 0 1 -5 1
Then to get the aggregates for positive/negative we can use dd.where(dd>0, 0).sum() to get 'Up':
pd.DataFrame.from_records({'Up': dd.where(dd>0, 0).sum(), 'Down': -dd.where(dd<0, 0).sum()}).transpose()
Jan_20 Apr_20 Aug_20 Oct_20
Down 0 0 6 0
Up 0 2 0 2
(pandas gave us an unwanted sort of the row index in alphabetical order, you can trivially fix that)
