Pandas groups into the numpy arrays including the group info-CodePudding

I have a dataframe like this,

   df = pd.DataFrame({
            'id': ['A','A','A','B','B','C','C','C','C'],
            'groupId': [11,35,46,11,26,25,39,50,55],
            'type': [1,1,1,1,1,2,2,2,2],      
         })

I want to turn the groups into the numpy arrays including the type value inside a list. I tried:

df.groupby(['id','type'])['groupId'].apply(np.array).tolist()

It is almost done. But I also want the type value at the very beginning of the numpy array. What I desire is:

[
np.array([1,11,35,46]),
np.array([1,11,26]),
np.array([2,25,39,50,55])
]

I feel it is easy. But I am stuck.

CodePudding user response：

Use x.name for type value and add to np.array:

a = df.groupby(['id','type'])['groupId'].apply(lambda x: np.array([x.name[1], *x])).tolist()
print (a)
[array([ 1, 11, 35, 46], dtype=int64),
 array([ 1, 11, 26], dtype=int64),
 array([ 2, 25, 39, 50, 55], dtype=int64)]

CodePudding user response：

You should first group by ID and Type, but only aggregating groupId into a list to start. Then you can assign a group that will lists your type and groupId together. It's possible with flatten.

    df = df.groupby(['id', 'type'], as_index=False).agg({
    'groupId' : list
})
df


    id  type    groupId
0   A   1   [11, 35, 46]
1   B   1   [11, 26]
2   C   2   [25, 39, 50, 55]

Flatten from this link :

def flatten(foo):
        for x in foo:
            if hasattr(x, '__iter__') and not isinstance(x, str):
                for y in flatten(x):
                    yield y
            else:
                yield x

Then you can create a flat list of type and groupId

df = df.assign(group=df[['type', 'groupId']].apply(lambda x: list(flatten(x)), axis = 1))
df

    id  type    groupId         group
0   A   1   [11, 35, 46]        [1, 11, 35, 46]
1   B   1   [11, 26]            [1, 11, 26]
2   C   2   [25, 39, 50, 55]    [2, 25, 39, 50, 55]

df['group'].apply(np.array).tolist()

[array([ 1, 11, 35, 46]), array([ 1, 11, 26]), array([ 2, 25, 39, 50, 55])]