I have a csv file with the following structure
| Name | Position |
|---|---|
| Albert | Producer |
| John | Director |
| Anna | Volunteer |
| Gaston | Tech |
| other 51 names | other 51 positions |
I need to create random groups where each group has one member from each position. There are only four positions across the dataset. The final result would be something like
| Name | Position | Group |
|---|---|---|
| Albert | Producer | Group 1 |
| John | Director | Group 1 |
| Anna | Volunteer | Group 1 |
| Gaston | Tech | Group 1 |
| Martin | Producer | Group 2 |
| Vanessa | Director | Group 2 |
| Charles | Volunteer | Group 2 |
| Milan | Tech | Group 2 |
What is the most efficient way to create these groups?
Thank you in advance!
CodePudding user response:
You can use two groupby. One to shuffle the Positions using sample(frac=1), and the other to assign the groups:
(df.groupby('Position').pipe(lambda d: d.sample(frac=1))
.assign(Group=lambda d: d.groupby('Position').cumcount().add(1))
.sort_values(by=['Group', 'Position']) # optional
)
example output:
Name Position Group
1 John Director 1
4 Martin Producer 1
3 Gaston Tech 1
6 Charles Volunteer 1
5 Vanessa Director 2
0 Albert Producer 2
7 Milan Tech 2
2 Anna Volunteer 2
