Home > Back-end >  pandas dataframe with multiple conditions and calculation
pandas dataframe with multiple conditions and calculation

Time:01-07

I have the below df. Need to the replace the Value of 'Numerical' with a calculation as '4% of the Type_1' when 2 conditions are met: Type = Type_2 and the Numerical is > 0

df_in = pd.DataFrame([['Jan-2021','P_1','Type_1','5'],['Jan-2021','P_1','Type_2','10'],['Jan-2021','P_1','Type_3','15'],
              ['Feb-2021','P_1','Type_1','7'],['Feb-2021','P_1','Type_2','0'],['Feb-2021','P_1','Type_3','21'],
                 ['Mar-2021','P_2','Type_1','3'],['Mar-2021','P_2','Type_2','6'],['Mar-2021','P_2','Type_3','9']],
                 columns = ['Month-Yr','Product','Type','Numerical'])

   Month-Yr Product Type    Numerical
0   Jan-2021    P_1 Type_1  5
1   **Jan-2021  P_1 Type_2  10**
2   Jan-2021    P_1 Type_3  15
3   Feb-2021    P_1 Type_1  7
4   **Feb-2021  P_1 Type_2  0**
5   Feb-2021    P_1 Type_3  21
6   Mar-2021    P_2 Type_1  3
7   **Mar-2021  P_2 Type_2  6**
8   Mar-2021    P_2 Type_3  9

Expected result:

     Month-Yr   Product Type    Numerical
0   Jan-2021    P_1 Type_1  5
1   **Jan-2021  P_1 Type_2  0.2**
2   Jan-2021    P_1 Type_3  15
3   Feb-2021    P_1 Type_1  7
4   Feb-2021    P_1 Type_2  0
5   Feb-2021    P_1 Type_3  21
6   Mar-2021    P_2 Type_1  3
7   **Mar-2021  P_2 Type_2  0.12**
8   Mar-2021    P_2 Type_3  9

CodePudding user response:

Update:

for i in range(1, len(df_in)):
    df_in.loc[i, "Numerical"] = np.where(df_in.loc[i, "Type"] == "Type_2" and 
                                         float(df_in.loc[i, "Numerical"]) > 0, 
                                         float(df_in.loc[i-1, "Numerical"]) * 0.04, 
                                         df_in.loc[i, "Numerical"])

Output:

    Month-Yr    Type    Numerical
0   Jan-2021    Type_1  5
1   Jan-2021    Type_2  0.2
2   Jan-2021    Type_3  15
3   Feb-2021    Type_1  7
4   Feb-2021    Type_2  0
5   Feb-2021    Type_3  21
6   Mar-2021    Type_1  3
7   Mar-2021    Type_2  0.12
8   Mar-2021    Type_3  9

CodePudding user response:

One option is to flip the data into horizontal form, before computing the conditions, and finally flipping back to vertical form; this should be faster than a for loop, as it is vectorized and relies on Pandas methods:

# you can ignore this,
# if you are not fuzzy on the order
from pandas.api.types import CategoricalDtype
categories = CategoricalDtype(categories = df_in['Month-Yr'].unique(), 
                              ordered = True)
temp = (df_in
         # need the column as numbers
        .astype({'Numerical': int, 'Month-Yr':categories})
        .pivot(index=['Month-Yr', 'Product'], 
               columns='Type', 
               values='Numerical')
                # conditional statement here,
                # similar to python's if-else statement                 
        .assign(Type_2 = lambda df: np.where(df.Type_2 > 0,
                                             df.Type_1 * 0.04,
                                             df.Type_2))
        .stack()
        .reset_index(name='NUmerical')
      )


   Month-Yr Product    Type  Numerical
0  Jan-2021     P_1  Type_1       5.00
1  Jan-2021     P_1  Type_2       0.20
2  Jan-2021     P_1  Type_3      15.00
3  Feb-2021     P_1  Type_1       7.00
4  Feb-2021     P_1  Type_2       0.00
5  Feb-2021     P_1  Type_3      21.00
6  Mar-2021     P_2  Type_1       3.00
7  Mar-2021     P_2  Type_2       0.12
8  Mar-2021     P_2  Type_3       9.00

CodePudding user response:

Update:

You can use the pandas method iterrows (doc) to iterate over each row of your DataFrame and make the change only when it meets those conditions (where index-1 is the index of the Type_1 value) with:

for index, row in df_in.iterrows():
    if row['Type'] == 'Type_2' and float(row['Numerical']) > 0:
        row['Numerical'] = (float(df_in.loc[index-1, 'Numerical']) * 0.04)

Here is the result:

    Month-Yr    Type    Numerical
0   Jan-2021    Type_1  5
1   Jan-2021    Type_2  0.2
2   Jan-2021    Type_3  15
3   Feb-2021    Type_1  7
4   Feb-2021    Type_2  0
5   Feb-2021    Type_3  21
6   Mar-2021    Type_1  3
7   Mar-2021    Type_2  0.12
8   Mar-2021    Type_3  9
  •  Tags:  
  • Related