I have the dataframe as follow:
dataframe generator:
df = pd.DataFrame({
'year':[2000,2001,2002]*3,
'id':['a']*3 ['b']*3 ['c']*3,
'othernulcol': ['xyz']*3 [np.nan]*4 ['tyu']*2,
'val':[np.nan,2,3,4,5,6,7,8,9]
})
data looks like:
year id othernulcol val
0 2000 a xyz NaN
1 2001 a xyz 2.0
2 2002 a xyz 3.0
3 2000 b NaN 4.0
4 2001 b NaN 5.0
5 2002 b NaN 6.0
6 2000 c NaN 7.0
7 2001 c tyu 8.0
8 2002 c tyu 9.0
I want to create new 3 rows from 2000 to 2002 that is the sum of row with id = a and b in the same year. othernulcol is just other column in dataframe. When creating new rows, just set those cols as np.NaN
Expected output:
year id othernulcol val
0 2000 a xyz NaN
1 2001 a xyz 2.0
2 2002 a xyz 3.0
3 2000 b NaN 4.0
4 2001 b NaN 5.0
5 2002 b NaN 6.0
6 2000 c NaN 7.0
7 2001 c tyu 8.0
8 2002 c tyu 9.0
9 2000 ab NaN NaN
10 2001 ab NaN 10.0
11 2002 ab NaN 12.0
Thank you for reading
CodePudding user response:
Filter values by categories and convert year to index for align same years from another DataFrame, sum values by DataFrame.add and append to original DataFrame by concat:
cols = ['id','val']
df1 = df[df['id'].eq('a')].set_index('year')[cols]
df2 = df[df['id'].eq('b')].set_index('year')[cols]
df = pd.concat([df, df1.add(df2).reset_index()], ignore_index=True)
print (df)
year id othernulcol val
0 2000 a xyz NaN
1 2001 a xyz 2.0
2 2002 a xyz 3.0
3 2000 b NaN 4.0
4 2001 b NaN 5.0
5 2002 b NaN 6.0
6 2000 c NaN 7.0
7 2001 c tyu 8.0
8 2002 c tyu 9.0
9 2000 ab NaN NaN
10 2001 ab NaN 7.0
11 2002 ab NaN 9.0
CodePudding user response:
Another solution could be as follows:
- Select rows from
dfwithdf.id.isin(['a','b'](seeSeries.isin) and applydf.groupbytoyear. - For the aggregration, use
sumfor columnid. For columnvaluse a lambda function to applySeries.sum, which allowsskipna=False. - Finally, use
pd.concatto add the result to the originaldfwith ignoring the index.
out = pd.concat([df,df[df.id.isin(['a','b'])]\
.groupby('year', as_index=False)\
.agg({'id':'sum',
'val':lambda x: x.sum(skipna=False)})],
ignore_index=True)
print(out)
year id othernulcol val
0 2000 a xyz NaN
1 2001 a xyz 2.0
2 2002 a xyz 3.0
3 2000 b NaN 4.0
4 2001 b NaN 5.0
5 2002 b NaN 6.0
6 2000 c NaN 7.0
7 2001 c tyu 8.0
8 2002 c tyu 9.0
9 2000 ab NaN NaN
10 2001 ab NaN 7.0
11 2002 ab NaN 9.0
