I have the below df. Need to the replace the Value of 'Numerical' with a calculation as '4% of the Type_1' when 2 conditions are met: Type = Type_2 and the Numerical is > 0
df_in = pd.DataFrame([['Jan-2021','P_1','Type_1','5'],['Jan-2021','P_1','Type_2','10'],['Jan-2021','P_1','Type_3','15'],
['Feb-2021','P_1','Type_1','7'],['Feb-2021','P_1','Type_2','0'],['Feb-2021','P_1','Type_3','21'],
['Mar-2021','P_2','Type_1','3'],['Mar-2021','P_2','Type_2','6'],['Mar-2021','P_2','Type_3','9']],
columns = ['Month-Yr','Product','Type','Numerical'])
Month-Yr Product Type Numerical
0 Jan-2021 P_1 Type_1 5
1 **Jan-2021 P_1 Type_2 10**
2 Jan-2021 P_1 Type_3 15
3 Feb-2021 P_1 Type_1 7
4 **Feb-2021 P_1 Type_2 0**
5 Feb-2021 P_1 Type_3 21
6 Mar-2021 P_2 Type_1 3
7 **Mar-2021 P_2 Type_2 6**
8 Mar-2021 P_2 Type_3 9
Expected result:
Month-Yr Product Type Numerical
0 Jan-2021 P_1 Type_1 5
1 **Jan-2021 P_1 Type_2 0.2**
2 Jan-2021 P_1 Type_3 15
3 Feb-2021 P_1 Type_1 7
4 Feb-2021 P_1 Type_2 0
5 Feb-2021 P_1 Type_3 21
6 Mar-2021 P_2 Type_1 3
7 **Mar-2021 P_2 Type_2 0.12**
8 Mar-2021 P_2 Type_3 9
CodePudding user response:
Update:
for i in range(1, len(df_in)):
df_in.loc[i, "Numerical"] = np.where(df_in.loc[i, "Type"] == "Type_2" and
float(df_in.loc[i, "Numerical"]) > 0,
float(df_in.loc[i-1, "Numerical"]) * 0.04,
df_in.loc[i, "Numerical"])
Output:
Month-Yr Type Numerical
0 Jan-2021 Type_1 5
1 Jan-2021 Type_2 0.2
2 Jan-2021 Type_3 15
3 Feb-2021 Type_1 7
4 Feb-2021 Type_2 0
5 Feb-2021 Type_3 21
6 Mar-2021 Type_1 3
7 Mar-2021 Type_2 0.12
8 Mar-2021 Type_3 9
CodePudding user response:
One option is to flip the data into horizontal form, before computing the conditions, and finally flipping back to vertical form; this should be faster than a for loop, as it is vectorized and relies on Pandas methods:
# you can ignore this,
# if you are not fuzzy on the order
from pandas.api.types import CategoricalDtype
categories = CategoricalDtype(categories = df_in['Month-Yr'].unique(),
ordered = True)
temp = (df_in
# need the column as numbers
.astype({'Numerical': int, 'Month-Yr':categories})
.pivot(index=['Month-Yr', 'Product'],
columns='Type',
values='Numerical')
# conditional statement here,
# similar to python's if-else statement
.assign(Type_2 = lambda df: np.where(df.Type_2 > 0,
df.Type_1 * 0.04,
df.Type_2))
.stack()
.reset_index(name='NUmerical')
)
Month-Yr Product Type Numerical
0 Jan-2021 P_1 Type_1 5.00
1 Jan-2021 P_1 Type_2 0.20
2 Jan-2021 P_1 Type_3 15.00
3 Feb-2021 P_1 Type_1 7.00
4 Feb-2021 P_1 Type_2 0.00
5 Feb-2021 P_1 Type_3 21.00
6 Mar-2021 P_2 Type_1 3.00
7 Mar-2021 P_2 Type_2 0.12
8 Mar-2021 P_2 Type_3 9.00
CodePudding user response:
Update:
You can use the pandas method iterrows (doc) to iterate over each row of your DataFrame and make the change only when it meets those conditions (where index-1 is the index of the Type_1 value) with:
for index, row in df_in.iterrows():
if row['Type'] == 'Type_2' and float(row['Numerical']) > 0:
row['Numerical'] = (float(df_in.loc[index-1, 'Numerical']) * 0.04)
Here is the result:
Month-Yr Type Numerical
0 Jan-2021 Type_1 5
1 Jan-2021 Type_2 0.2
2 Jan-2021 Type_3 15
3 Feb-2021 Type_1 7
4 Feb-2021 Type_2 0
5 Feb-2021 Type_3 21
6 Mar-2021 Type_1 3
7 Mar-2021 Type_2 0.12
8 Mar-2021 Type_3 9
