I have following dataframe (df):
value1 value2 total
Index1 Index2 Index3
A0 B0 C0 10 15 25
C1 11 12 23
C2 1 2 3
A1 B1 C0 2 3 5
C1 3 4 7
C2 4 5 9
...
What I want to do is to divide value1 and value2 by total only if the value of Index3 is C0.
In the case of C1 and C2, value1 and value2 must be divided by value1 and value2 where Index3 is C0, respectively.
As a result, new dataframe must be:
value1 value2 total
Index1 Index2 Index3
A0 B0 C0 10/25 15/25 25
C1 11/10 12/15 23
C2 1/10 2/15 3
A1 B1 C0 2/5 3/5 5
C1 3/2 4/3 7
C2 4/2 5/3 9
...
How to achieve this complex operation?
CodePudding user response:
You can declare denominator columns based on your requirements:
- If
Index3 == 'C0', divide bytotal. - Otherwise, divide by the value where
Index3 == 'C0'.
# Reset index to be able to reference `Index` columns more easily
df.reset_index(inplace=True)
# Create catalog of divisors for cases when `Index3 == 'C0'`
cat = df.loc[
df['Index3'].eq('C0'),
['Index1','Index2','value1','value2']
].rename(columns={'value1':'div1', 'value2':'div2'})
# Merge data with catalog
df = pd.merge(df, cat, on=['Index1','Index2'], how='inner')
# Update divisors when `Index3 == 'C0'
df[['div1', 'div2']] = df[['div1', 'div2']].apply(
lambda x: np.where(
df['Index3'] == 'C0', df['total'], x
)
)
# Restore multi index
df.set_index(['Index1', 'Index2', 'Index3'], inplace=True)
The data now looks as follows:
value1 value2 total div1 div2
Index1 Index2 Index3
A0 B0 C0 10 15 25 25 25
C1 11 12 23 10 15
C2 1 2 3 10 15
A1 B1 C0 2 3 5 5 5
C1 3 4 7 2 3
C2 4 5 9 2 3
So you only have to divide value1 by div1 and value2 by div2:
# Update `value1` and `value2` by diviging by its corresponding divisor column
df['value1'] = df['value1'] / df['div1']
df['value2'] = df['value2'] / df['div2']
