How can I get an array that aggregates the grouped column into a single entity (list/array) while also returning NaNs for results that do not match the where clause condition?
# example
df1 = pd.DataFrame({'flag': [1, 1, 0, 0],
'part': ['a', 'b', np.nan, np.nan],
'id': [1, 1, 2, 3]})
# my try
np.where(df1['flag'] == 1, df1.groupby(['id'])['part'].agg(np.array), df1.groupby(['id'])['part'].agg(np.array))
# operands could not be broadcast together with shapes (4,) (3,) (3,)
# expected
np.array((np.array(('a', 'b')), np.array(('a', 'b')), np.nan, np.nan), dtype=object)
CodePudding user response:
Drop the rows having NaN values in the part column, then group the remaining rows by id and aggregate part using list, finally map the aggregated dataframe onto flag column to get the result
s = df1.dropna(subset=['part']).groupby('id')['part'].agg(list)
df1['id'].map(s).to_numpy()
array([list(['a', 'b']), list(['a', 'b']), nan, nan], dtype=object)
