Home > database >  Pandas: How do I get aggregate value changes between columns?
Pandas: How do I get aggregate value changes between columns?

Time:02-02

So, given a dataframe as follows:

Item    Jan_20  Apr_20  Aug_20  Oct_20
Apple       3       4       3       4
Orange      5       6       1       2

I need to identify change in value (increase/decrease) on each product, over the four quarters, and aggregate these fluctuations for each quarter.

  • If the value goes up one quarter compared to the previous quarter, I need to add the increase to an aggregated value Up for that quarter.
  • If it goes down, I need to record it as Down for that quarter.

Desired output:

Type                Jan_20  Apr_20  Aug_20  Oct_20
Up                      0      2       0        2
Down                    0      0       6        0

How could I achieve this?

CodePudding user response:

Try this:

diff = df.set_index('Item').diff(axis=1).sum(axis=0)
new_df = (pd.DataFrame([
            np.where(diff.ge(0), diff, 0),
            np.where(diff.lt(0), abs(diff), 0)
        ],
        index=['Up', 'Down'],
        columns=diff.index
    )
    .astype(int))

Output:

>>> new_df
      Jan_20  Apr_20  Aug_20  Oct_20
Up         0       2       0       2
Down       0       0       6       0

CodePudding user response:

To get differences between columns, use pandas .diff(axis=1):

dd = df.diff(axis=1).fillna(0).astype(int)

        Jan_20  Apr_20  Aug_20  Oct_20
Item                                  
Apple        0       1      -1       1
Orange       0       1      -5       1

Then to get the aggregates for positive/negative we can use dd.where(dd>0, 0).sum() to get 'Up':

pd.DataFrame.from_records({'Up': dd.where(dd>0, 0).sum(), 'Down': -dd.where(dd<0, 0).sum()}).transpose()

      Jan_20  Apr_20  Aug_20  Oct_20
Down       0       0       6       0
Up         0       2       0       2

(pandas gave us an unwanted sort of the row index in alphabetical order, you can trivially fix that)

  •  Tags:  
  • Related