Python: Concatenating Strings from Grouped analysis-CodePudding

I need some help concatenating some strings from a DF.

I have the below "original" df:

df = pd.DataFrame({'Material': [1,1,1,2,2,2,3,3,3,3],
                'BOM': ['A','B',np.nan,'A',np.nan,'C','A','A','B','C']})

	Material	BOM
0	1	A
1	1	B
2	1	NaN
3	2	A
4	2	NaN
5	2	C
6	3	A
7	3	A
8	3	B
9	3	C

Expected Result is:

Material	BOM
1	A, B
2	A, C
3	A, B, C

I'm new with Python and am thinking of doing it with a mix of Loop and Dict but definitely think there must be a better way.

The tricky bit is that I think I need to group by material and check column "BOM" for the distinct options (different than NaN) then some sort of ", ".join them.

I've tried the following:

df.groupby('Material')['BOM'].apply(', '.join)

And got the error: "TypeError: sequence item 2: expected str instance, float found"

If anyone could help me I would be really grateful

CodePudding user response：

You can dropna and drop_duplicates, then GroupBy.agg:

df.dropna().drop_duplicates().groupby('Material').agg(', '.join)

or:

df.dropna().groupby('Material').agg(lambda s: ', '.join(s.drop_duplicates()))

output:

              BOM
Material         
1            A, B
2            A, C
3         A, B, C

CodePudding user response：

3 solutions depending on output requirements:

Using .agg: df.dropna().drop_duplicates().groupby('Material').agg(', '.join)

Output:

              BOM
Material         
1            A, B
2            A, C
3         A, B, C

Using .apply(list): df.dropna().drop_duplicates().groupby('Material')['BOM'].apply(list).reset_index()

Output:

   Material        BOM
0         1     [A, B]
1         2     [A, C]
2         3  [A, B, C]

Using .apply(set): df.dropna().groupby('Material')['BOM'].apply(set).reset_index()

Output:

   Material        BOM
0         1     {A, B}
1         2     {A, C}
2         3  {A, C, B}