Home > database >  Calculating index values based on previous calculations using the same calculation
Calculating index values based on previous calculations using the same calculation

Time:01-12

I have a df:

pd.DataFrame(index = ['A','B','C'],
             columns = ['1','2','3','4'], 
             data = [[100,60,40,60],
                     [200,10,50,80],
                     [50, np.nan, np.nan, np.nan]])
        1           2           3           4        
A       100         60          40          60
B       200         10          50          80
C       50          

I would like to calculate the remaining C index values, but each calculation is dependent on the previous value like so:

        1           2           3           4        
A       100         60          40          60
B       200         10          50          80
C       50          A2 B2-C1    A3 B3-C2    A4 B4-C3

I checked this answer and tried the following:

new = [df.loc['C'].values]

for i in range(1, len(df.index)):
    new.append(new[i-1]*df.loc['A'].values[i] df.loc['B'].values[i]-df.loc['C'].values[i-1])
df.loc['C'] = new

But I get :

ValueError: cannot set a row with mismatched columns

Also, the question and answers are quite outdated, maybe there is a new solution for these recursive functions inside pandas dataframe?

CodePudding user response:

Key is : print your variables to ensure they contains what you think


  • First is that new = [df.loc['C'].values] builds a list with one item being an array, you just want one list

  • Then if the loop you're using new[i-1] *, which isn't present in the schema above

  • you use df.loc['C'].values[i-1] but you don't update it (you save in a list) so you can't expect it to work

    • directly update the DF and use - df.loc['C'].values[i-1]
    • keep the temporaty list and use - new[i - 1]
  • you don't want to append, but overwrite the values (or you'd have need to start new with only one value


With a separate list

new = df.loc['C'].to_list()

for i in range(1, len(df.columns)):
    new[i] = df.loc['A'].values[i]   df.loc['B'].values[i] - new[i - 1]

Without a separate list

for i in range(1, len(df.columns)):
    df.iloc[2, i] = df.iloc[1, i]   df.iloc[0, i] - df.iloc[2, i - 1]

CodePudding user response:

replace the empty spaces with NaN and leverage fillna to input sum of preceding rows. Used the loc accessor to subtract first column from each of the other columns.

df =df.replace('', np.nan).apply(lambda x: x.fillna(x.sum(0)))
df.loc['C','2':'4'] =df.loc['C','2':'4']-df.loc['C','1']

     1     2     3     4
A  100  60.0  40.0  60.0
B  200  10.0  50.0  80.0
C   50  20.0  40.0  90.0
  •  Tags:  
  • Related