Calculating index values based on previous calculations using the same calculation-CodePudding

I have a df:

pd.DataFrame(index = ['A','B','C'],
             columns = ['1','2','3','4'], 
             data = [[100,60,40,60],
                     [200,10,50,80],
                     [50, np.nan, np.nan, np.nan]])

        1           2           3           4        
A       100         60          40          60
B       200         10          50          80
C       50

I would like to calculate the remaining C index values, but each calculation is dependent on the previous value like so:

        1           2           3           4        
A       100         60          40          60
B       200         10          50          80
C       50          A2 B2-C1    A3 B3-C2    A4 B4-C3

I checked this answer and tried the following:

new = [df.loc['C'].values]

for i in range(1, len(df.index)):
    new.append(new[i-1]*df.loc['A'].values[i] df.loc['B'].values[i]-df.loc['C'].values[i-1])
df.loc['C'] = new

But I get :

ValueError: cannot set a row with mismatched columns

Also, the question and answers are quite outdated, maybe there is a new solution for these recursive functions inside pandas dataframe?

CodePudding user response：

Key is : print your variables to ensure they contains what you think

First is that new = [df.loc['C'].values] builds a list with one item being an array, you just want one list
Then if the loop you're using new[i-1] *, which isn't present in the schema above
you use df.loc['C'].values[i-1] but you don't update it (you save in a list) so you can't expect it to work
- directly update the DF and use - df.loc['C'].values[i-1]
- keep the temporaty list and use - new[i - 1]
you don't want to append, but overwrite the values (or you'd have need to start new with only one value

With a separate list

new = df.loc['C'].to_list()

for i in range(1, len(df.columns)):
    new[i] = df.loc['A'].values[i]   df.loc['B'].values[i] - new[i - 1]

Without a separate list

for i in range(1, len(df.columns)):
    df.iloc[2, i] = df.iloc[1, i]   df.iloc[0, i] - df.iloc[2, i - 1]

CodePudding user response：

replace the empty spaces with NaN and leverage fillna to input sum of preceding rows. Used the loc accessor to subtract first column from each of the other columns.

df =df.replace('', np.nan).apply(lambda x: x.fillna(x.sum(0)))
df.loc['C','2':'4'] =df.loc['C','2':'4']-df.loc['C','1']

     1     2     3     4
A  100  60.0  40.0  60.0
B  200  10.0  50.0  80.0
C   50  20.0  40.0  90.0