I am not sure how to solve the following. I am having a dataframe which looks like this:
df <- structure(list(category = c(1, 2, 3, 4, 5), f1 = c(2, 3, 2, 3, 1),
f2 = c(1, 2, 4 ,1, 2)), row.names = c(NA, -5L),
class = c("tbl_df", "tbl", "data.frame"))
The numbers in the variables (except the variable named "category") represent subjects which belong in one of those categories and f1 and f2 are a variable split into these two variables. I would like to "melt" them and those 2 in f1 variable for example would become 2 rows within a variable called "f" and they will have the number 1 while keeping the category next to them which is 1 in that case. F2, category 1 has 1 observation and would become 1 row within again the variable "f" and category 1 but it will have the number 2 this time.
So f1 will always produce rows which will have 1 in the column "f" and f2 will produce rows which will always have "2" in column "f".
Below is a visual example of what I am trying to achieve:
category f
1 1 1
2 1 1
3 1 2
4 2 1
5 2 1
6 2 1
7 2 2
8 2 2
9 3 1
10 3 1
11 3 2
12 3 2
13 3 2
14 3 2
Thank you in advance.
CodePudding user response:
df %>%
pivot_longer(cols=c("f1", "f2")) %>%
mutate(name=replace(name, name=="f1", 1)) %>%
mutate(name=replace(name, name=="f2", 2)) %>%
group_by(category,name,value) %>%
expand(value=seq(1:value)) %>%
rename(f=name) %>%
as.data.frame() %>%
select(-value)
category f
1 1 1
2 1 1
3 1 2
4 2 1
5 2 1
6 2 1
7 2 2
8 2 2
9 3 1
10 3 1
11 3 2
12 3 2
13 3 2
14 3 2
15 4 1
16 4 1
17 4 1
18 4 2
19 5 1
20 5 2
21 5 2
CodePudding user response:
This is a perfect task for tidyr::uncount:
library(tidyr)
df %>%
pivot_longer(cols= c("f1", "f2"), names_to = "f", names_transform = list(f = seq)) %>%
uncount(value)
Output:
df
category f
1 1 1
2 1 1
3 1 2
4 2 1
5 2 1
6 2 1
7 2 2
8 2 2
9 3 1
10 3 1
11 3 2
12 3 2
13 3 2
14 3 2
15 4 1
16 4 1
17 4 1
18 4 2
19 5 1
20 5 2
21 5 2
CodePudding user response:
With base R:
nGroups = length(df$category)
df <- data.frame(category = rep(seq(nGroups), df$f1 df$f2),
f = rep(rep(1:2, nGroups), c(rbind(df$f1, df$f2))))
category f
1 1 1
2 1 1
3 1 2
4 2 1
5 2 1
6 2 1
7 2 2
8 2 2
9 3 1
10 3 1
11 3 2
12 3 2
13 3 2
14 3 2
15 4 1
16 4 1
17 4 1
18 4 2
19 5 1
20 5 2
21 5 2
