Adding stratified variable to a dataframe in R-CodePudding

I have some data that I want to split into 4 equal parts based on the group.

My dataframe looks like this:

X	Group
1	1
2	1
3	1
4	1
5	1
6	1
7	2
8	2
9	3
10	3
11	3
12	3
13	3
14	3
15	3
16	3

Now I thought about adding a thrid column to mark which data belong to which split, like this:

X	Group	Split
1	1	1
2	1	3
3	1	2
4	1	4
5	1	4
6	1	2
7	2	3
8	2	1
9	3	1
10	3	2
11	3	3
12	3	4
13	3	1
14	3	2
15	3	3
16	3	4

I don't need to actually split the dataset, because the data are videos and I just have to mark how (which person) has to watch them.

I know how I can generate random numbers, but I need them to be stratified to the group.

I know how I can get a stratified sample, but thats not I want, because I want to distribute ALL data (videos in this case) but in a stratified fashion.

Can you help me how to achieve this?

Thank you!

edit: I changed to example to unequally sized groups.

CodePudding user response：

You can easily do these kind of stratified operations using dplyr::group_by():

library(tidyverse)

df <- data.frame(
    X = 1:12,
    Group = c(rep(1,4), rep(2,4), rep(3,4))
)

df %>%
  group_by(Group) %>%
  mutate(Split = sample(seq_along(X), size = n(), replace = FALSE) %% 4   1) %>% 
  ungroup()