How to modify a dataframe with conditions for each name duplicated-CodePudding

This is data that indicates whether a single person has capitalizable and non-capitalizable contracts. I want to have it grouped by names and create a new column that tells me if he/she has both contracts and the sum of all contracts. (its a pretty big one)

id	name	Capitalization	number of contracts
1	Jimmy	yes	3
1	Jimmy	no	1
2	Jenny	no	7
3	Elle	yes	5
4	Danny	yes	2
5	Charles	yes	1
6	Freddy	no	3
7	Elle	yes	5
7	Elle	no	3

The final result would look like this:

id	name	Capitalization_x	number of contracts
1	Jimmy	both	4
2	Jenny	no	7
3	Elle	yes	5
4	Danny	yes	2
5	Charles	yes	1
6	Freddy	no	3
7	Elle	both	8

I am really stuck with this problem.

CodePudding user response：

Using aggregation:

(df.groupby('id').agg({'Capitalization': lambda s: 'both' if len(set(s))==2 else s, 
                       'number of contracts': 'sum',
                       'name': 'first'}))

Here, I assume the only possible values in Capitalization are "yes" or "no". Let me know if otherwise.

You can always change the logic in the lambda function to accommodate more complex cases.

       name Capitalization  number of contracts
id                                             
1     Jimmy           both                    4
2     Jenny             no                    7
3      Elle            yes                    5
4     Danny            yes                    2
5   Charles            yes                    1
6    Freddy             no                    3
7      Elle           both                    8