I have a dataframe like this,
df = pd.DataFrame({
'id': ['A','A','A','B','B','C','C','C','C'],
'groupId': [11,35,46,11,26,25,39,50,55],
'type': [1,1,1,1,1,2,2,2,2],
})
I want to turn the groups into the numpy arrays including the type value inside a list. I tried:
df.groupby(['id','type'])['groupId'].apply(np.array).tolist()
It is almost done. But I also want the type value at the very beginning of the numpy array. What I desire is:
[
np.array([1,11,35,46]),
np.array([1,11,26]),
np.array([2,25,39,50,55])
]
I feel it is easy. But I am stuck.
CodePudding user response:
Use x.name for type value and add to np.array:
a = df.groupby(['id','type'])['groupId'].apply(lambda x: np.array([x.name[1], *x])).tolist()
print (a)
[array([ 1, 11, 35, 46], dtype=int64),
array([ 1, 11, 26], dtype=int64),
array([ 2, 25, 39, 50, 55], dtype=int64)]
CodePudding user response:
You should first group by ID and Type, but only aggregating groupId into a list to start. Then you can assign a group that will lists your type and groupId together. It's possible with flatten.
df = df.groupby(['id', 'type'], as_index=False).agg({
'groupId' : list
})
df
id type groupId
0 A 1 [11, 35, 46]
1 B 1 [11, 26]
2 C 2 [25, 39, 50, 55]
Flatten from this link :
def flatten(foo):
for x in foo:
if hasattr(x, '__iter__') and not isinstance(x, str):
for y in flatten(x):
yield y
else:
yield x
Then you can create a flat list of type and groupId
df = df.assign(group=df[['type', 'groupId']].apply(lambda x: list(flatten(x)), axis = 1))
df
id type groupId group
0 A 1 [11, 35, 46] [1, 11, 35, 46]
1 B 1 [11, 26] [1, 11, 26]
2 C 2 [25, 39, 50, 55] [2, 25, 39, 50, 55]
df['group'].apply(np.array).tolist()
[array([ 1, 11, 35, 46]), array([ 1, 11, 26]), array([ 2, 25, 39, 50, 55])]
