This is data that indicates whether a single person has capitalizable and non-capitalizable contracts. I want to have it grouped by names and create a new column that tells me if he/she has both contracts and the sum of all contracts. (its a pretty big one)
| id | name | Capitalization | number of contracts |
|---|---|---|---|
| 1 | Jimmy | yes | 3 |
| 1 | Jimmy | no | 1 |
| 2 | Jenny | no | 7 |
| 3 | Elle | yes | 5 |
| 4 | Danny | yes | 2 |
| 5 | Charles | yes | 1 |
| 6 | Freddy | no | 3 |
| 7 | Elle | yes | 5 |
| 7 | Elle | no | 3 |
The final result would look like this:
| id | name | Capitalization_x | number of contracts |
|---|---|---|---|
| 1 | Jimmy | both | 4 |
| 2 | Jenny | no | 7 |
| 3 | Elle | yes | 5 |
| 4 | Danny | yes | 2 |
| 5 | Charles | yes | 1 |
| 6 | Freddy | no | 3 |
| 7 | Elle | both | 8 |
I am really stuck with this problem.
CodePudding user response:
Using aggregation:
(df.groupby('id').agg({'Capitalization': lambda s: 'both' if len(set(s))==2 else s,
'number of contracts': 'sum',
'name': 'first'}))
Here, I assume the only possible values in Capitalization are "yes" or "no". Let me know if otherwise.
You can always change the logic in the lambda function to accommodate more complex cases.
name Capitalization number of contracts
id
1 Jimmy both 4
2 Jenny no 7
3 Elle yes 5
4 Danny yes 2
5 Charles yes 1
6 Freddy no 3
7 Elle both 8
