Replace values of 0 in dataframe using mathematical formulas-CodePudding

I have two Pandas dataframes as below and I am trying to replace the 0 values from column Price in the first one.

The first dataframe looks like this:

Region	Country	Product	Year	Price
Africa	South Africa	ABC	2016	500
Africa	South Africa	ABC	2017	400
Africa	South Africa	ABC	2018	0
Africa	South Africa	ABC	2019	450
Africa	South Africa	XYZ	2016	750
Africa	South Africa	XYZ	2017	0
Africa	South Africa	XYZ	2018	0
Africa	South Africa	XYZ	2019	890
Asia	Japan	DEF	2016	0
Asia	Japan	DEF	2017	0
Asia	Japan	DEF	2018	415
Asia	Japan	DEF	2019	0

It contains price for each type of product for each country for a given year. Some countries have values of 0 for certain years as that data is missing.

The second dataframe contains the average yearly price growth for each region and product and looks like this.

Region	Product	Average Yearly Price Growth
Africa	ABC	0.043
Africa	DEF	0.071
Africa	XYZ	0.0128
Asia	XYZ	0.05
Asia	ABC	-0.009
Asia	DEF	0.021

The task is to replace all 0 values in the first dataframe with the previous year's value for that particular product in that particular country, plus the average yearly rate increase (assuming the earliest data point is available).

In the case of product DEF in Japan, I would want to detect the earliest year where data is available and calculate all the previous years where Price is 0 from that point as follows:

Product DEF in Japan in 2018 has a price of 415 and is the earliest available data point for Price. Therefore the price for 2017 should be 415/(1 0.021)=406.464. After that, the value for 2016 should be 406.464/(1 0.021). Once these values have been calculated, the value for DEF in Japan in 2019 should be 415 415*0.021=423.715.

For other countries such as South Africa where the price for the earliest year (2016) is available it should be easier. For example, for product XYZ in South Africa in 2017, instead of 0 the value should be 750 750*0.0128=759.6. Where 750 is the previous year value for that product in that country and 0.0128 is the average yearly price growth for that product in Africa (from dataframe 2).

Then for the same product but in 2018, the value should be the previously calculated one, 759.6 759.6*0.0128=769.32.

Here is the sample data:

data1 = {'Region': ['Africa','Africa','Africa','Africa','Africa','Africa','Africa','Africa','Asia','Asia','Asia','Asia'],
         'Country': ['South Africa','South Africa','South Africa','South Africa','South Africa','South Africa','South Africa','South Africa','Japan','Japan','Japan','Japan'],
         'Product': ['ABC','ABC','ABC','ABC','XYZ','XYZ','XYZ','XYZ','DEF','DEF','DEF','DEF'],
         'Year': [2016, 2017, 2018, 2019,2016, 2017, 2018, 2019,2016, 2017, 2018, 2019],
         'Price': [500, 400, 0,450,750,0,0,890,0,0,415,0]}

data2 = {'Region': ['Africa','Africa','Africa','Asia','Asia','Asia'],
         'Product': ['ABC','DEF','XYZ','XYZ','ABC','DEF'],
         'Average Yearly Price Growth': [0.043, 0.071, 0.0128,0.05,-0.009,0.021]}

df = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

My code so far is below, but it only works for situations where the earlies available point is available, like South Africa. Any ideas on how to deal with situations like Japan?

for index,rows in df.iterrows():
     if rows['Price']==0:
        try:
                    past_year_value = df[(df['Country']==rows['Country']) & (df['Product']==rows['Product']) & (df['Region']==rows['Region']) & (df['Year']==rows['Year']-1)]['Price'].values[0]
                    df.at[index,'Price'] = past_year_value   past_year_value*df2[(df2['Product']==rows['Product']) & (df2['Region']==rows['Region'])]['Average Yearly Price Growth'].values[0]
        except:
                    pass

CodePudding user response：

Simple fix add another for loop after that, this is more like forward fill and backward fill for the value of 0 , since we need to step , just like bfill and ffill chain

for index,rows in df.iterrows():
     if rows['Price']==0:
        try:
                    past_year_value = df[(df['Country']==rows['Country']) & (df['Product']==rows['Product']) & (df['Region']==rows['Region']) & (df['Year']==rows['Year']-1)]['Price'].values[0]
                    df.at[index,'Price'] = past_year_value   past_year_value*df2[(df2['Product']==rows['Product']) & (df2['Region']==rows['Region'])]['Average Yearly Price Growth'].values[0]
        except:
                    pass

df = df.iloc[::-1]
for index,rows in df.iterrows():
     if rows['Price']==0:
        try:
                    past_year_value = df[(df['Country']==rows['Country']) & (df['Product']==rows['Product']) & (df['Region']==rows['Region']) & (df['Year']==rows['Year'] 1)]['Price'].values[0]
                    df.at[index,'Price'] = past_year_value /(1 df2[(df2['Product']==rows['Product']) & (df2['Region']==rows['Region'])]['Average Yearly Price Growth'].values[0])
        except:
                    pass
df = df.iloc[::-1]
Out[88]: 
    Region       Country Product  Year  Price
0   Africa  South Africa     ABC  2016    500
1   Africa  South Africa     ABC  2017    400
2   Africa  South Africa     ABC  2018    417
3   Africa  South Africa     ABC  2019    450
4   Africa  South Africa     XYZ  2016    750
5   Africa  South Africa     XYZ  2017    759
6   Africa  South Africa     XYZ  2018    768
7   Africa  South Africa     XYZ  2019    890
8     Asia         Japan     DEF  2016    397
9     Asia         Japan     DEF  2017    406
10    Asia         Japan     DEF  2018    415
11    Asia         Japan     DEF  2019    423