The objective is to subtract a row (N) with previous row (N-1) separated by groups.
Given a df
years nchar nval
0 2019 a 1
1 2019 b 1
2 2019 c 1
3 2020 a 1
4 2020 s 4
Lets,separate into group of year 2019, and we denote it as df_2019
For df_2019, there we assign constant 10.
Then,only for index 0, we do the following operation and assign to a new column 'B`
df_2019.loc[df_2019.index[0], 'B']= 10 - df_2019['nval'].values[0]
Whereas, the other index
df_2019.loc[df_2019.index[N], 'B'] = df_2019['B'].values[N-1] - df_2019['nval'].values[N]
This, will produced the following table
years nchar nval C D B
1 2019 a 1 9
2 2019 b 1 8
3 2019 c 1 7
For the group 2020, the same computation apply. However, the only difference is, the constant value is the 7, which is taken from the last index of column B.
To answer this requirement, the following code is produced with extra possible groups.
import pandas as pd
year=[2019,2019,2019,2020,2020,2020,2020,2022,2022,2022]
nval=[1,1,1,1,4,1,4,5,6,7]
nchar=['a','b','c','a','s','c','a','b','c','g']
df=pd.DataFrame(zip(year,nchar,nval),columns=['years','nchar','nval'])
print(df)
year_ls=[2019,2020,2022]
nspacing_total=2
nspacing_between_df=4
all_df=[]
default_val=10
for idx,dyear in enumerate(year_ls):
df_=df[df['years']==dyear].reset_index(drop=True)
t=pd.DataFrame([[''] * 3]*len(df_), columns=["C", "D", "B"])
df_=pd.concat([df_,t],axis=1)
Total = df_['nval'].sum()
df_=pd.DataFrame([[''] * len(df.columns)]*1, columns=df.columns).append(df_).reset_index(drop=True)
if idx ==0:
df_.loc[df_.index[0], 'B']=default_val
if idx !=0:
pre_df=all_df[idx-1]
pre_val=pre_df['B'].values[-1]
nposi=1
pre_years=pre_df['years'].values[nposi]
df_.loc[df_.index[0], 'nchar']=f'From {pre_years}'
df_.loc[df_.index[0], 'B']=pre_val
for ndexd in range(df_.shape[0]-1):
df_.loc[df_.index[ndexd 1], 'B']=df_['B'].values[ndexd]-df_['nval'].values[ndexd 1]
df_=df_.append(pd.DataFrame([[''] * len(df.columns)]*nspacing_total, columns=df.columns)).reset_index(drop=True)
df_.loc[df_.index[-1], 'nval']=Total
df_.loc[df_.index[-1], 'nchar']='Total'
df_.loc[df_.index[-1], 'B']=df_['B'].values[0]-df_['nval'].values[-1]
all_df.append(df_)
However, I wonder whether this proposal can be further simplified further using pandas groupby or other. I really appreciate for any tips.
Ultimately, I would like to express the table as below, which will be exported to excel
years nchar nval C D B
0 10
1 2019 a 1 9
2 2019 b 1 8
3 2019 c 1 7
4
5 Total 3 7
6
7
8
9
10 From 2019 7
11 2020 a 1 6
12 2020 s 4 2
13 2020 c 1 1
14 2020 a 4 -3
15
16 Total 10 -3
17
18
19
20
21 From 2020 -3
22 2022 b 5 -8
23 2022 c 6 -14
24 2022 g 7 -21
25
26 Total 18 -21
27
28
29
30
The code to produced the above table
# Optional to represent the table above
all_ap_df=[]
for a_df in all_df:
df=a_df.append(pd.DataFrame([[''] * len(df.columns)]*nspacing_between_df, columns=df.columns)).reset_index(drop=True)
all_ap_df.append(df)
df=pd.concat(all_ap_df,axis=0).reset_index(drop=True)
df.loc[df_.index[0], 'D']=df['B'].values[0]
df.loc[df_.index[0], 'B']=''
df = df.fillna('')
CodePudding user response:
I think this is actually quite simple. Use groupby cumsum:
df['B'] = 10 - df['nval'].cumsum()
Output:
>>> df
years nchar nval B
0 2019 a 1 9
1 2019 b 1 8
2 2019 c 1 7
3 2020 a 1 6
4 2020 s 4 2
CodePudding user response:
In your case chain with groupby
df['new'] = df.groupby('years')['nval'].cumsum().rsub(10)
Out[8]:
0 9
1 8
2 7
3 9
4 5
Name: nval, dtype: int64
