I have the following data:
df <- structure(list(automatic = c("organismo", "bolha", "organismo",
"organismo", "cosc_multiplo", "cosc_multiplo", "coscinodiscus",
"detritos", "mult_organismos", "multiplos", "organismo", "sombra",
"detritos", "mult_organismos", "detritos", "mult_organismos",
"detritos", "org_partes", "detritos", "organismo", "organismo",
"detritos", "organismo", "organismo", "organismo", "bolha", "coral_falso",
"coscinodiscus", "detritos", "LRaw", "multiplos", "organismo",
"sombra"), validated = c("appendicularia", "bolha", "cnidaria",
"copepodo", "cosc_multiplo", "coscinodiscus", "coscinodiscus",
"coscinodiscus", "coscinodiscus", "coscinodiscus", "coscinodiscus",
"coscinodiscus", "detritos", "detritos", "langanho", "mult_organismos",
"multiplos", "org_partes", "organismo", "organismo", "palmeria",
"pelotas_mix", "phyto", "phyto_cadeia", "phyto_espiral", "sombra",
"sombra", "sombra", "sombra", "sombra", "sombra", "sombra", "sombra"
), N = c(2L, 1L, 2L, 1L, 2L, 1L, 1229L, 3L, 2L, 4L, 5L, 57L,
1569L, 1L, 87L, 31L, 1L, 7L, 1L, 75L, 2L, 11L, 4L, 1L, 1L, 1L,
10L, 25L, 536L, 25L, 30L, 562L, 3678L)), row.names = c(NA, -33L
), class = c("tbl_df", "tbl", "data.frame"))
I would to shown all combinations in columns automatic and validated.
For example, I hadn't the combination: bolha (in the automatic column) with appendicularia (in the validated column). I would like to show this combination, and the all other's absents, with a value of 0 in column N.
Where are combinations it has to maintain their value in N column. Like bolha (in automatic column) with bolha (in validated column) has a value in N of 1, it does not have to change.
Thanks all
CodePudding user response:
If you want to get all unique combinations and maintain the original values for N, then you can first use crossing from tidyr to get all unique combinations. Then, we can do a left join to add in the N values from the original dataframe, and finally change NA to 0 for N.
library(tidyverse)
left_join(crossing(automatic = df$automatic, validated = df$validated),
df,
by = c("automatic", "validated")) %>%
replace_na(list(N = 0))
Or a shorter option is to simply use rows_update instead of doing a join:
crossing(automatic = df$automatic, validated = df$validated, N = 0) %>%
rows_update(df, by = c("automatic", "validated"))
Output
# A tibble: 198 × 3
automatic validated N
<chr> <chr> <int>
1 bolha appendicularia 0
2 bolha bolha 1
3 bolha cnidaria 0
4 bolha copepodo 0
5 bolha cosc_multiplo 0
6 bolha coscinodiscus 0
7 bolha detritos 0
8 bolha langanho 0
9 bolha mult_organismos 0
10 bolha multiplos 0
# … with 188 more rows
CodePudding user response:
Here is an approach using expand.grid -> similar to @AndrewGB s solution:
library(dplyr)
expand_grid(automatic=df$automatic, validated=df$validated, N=0) %>%
rows_update(df, by = c("automatic", "validated")) %>%
distinct() %>%
arrange(automatic)
automatic validated N
<chr> <chr> <dbl>
1 bolha appendicularia 0
2 bolha bolha 1
3 bolha cnidaria 0
4 bolha copepodo 0
5 bolha cosc_multiplo 0
6 bolha coscinodiscus 0
7 bolha detritos 0
8 bolha langanho 0
9 bolha mult_organismos 0
10 bolha multiplos 0
# … with 188 more rows
CodePudding user response:
There is also complete which is a wrapper around expand and join
df |>
complete(automatic, validated, fill = list(N = 0))
automatic validated N
<chr> <chr> <int>
1 bolha appendicularia 0
2 bolha bolha 1
3 bolha cnidaria 0
4 bolha copepodo 0
5 bolha cosc_multiplo 0
6 bolha coscinodiscus 0
7 bolha detritos 0
8 bolha langanho 0
9 bolha mult_organismos 0
10 bolha multiplos 0
# … with 188 more rows
If you want a unique combination whereby there is only one combination of automatic and validated when sorted. Then in dplyr you can do
df |>
complete(automatic, validated, fill = list(N = 0)) |>
rowwise() |>
mutate(m = paste(sort(c(validated, automatic)), collapse = ", ")) |>
group_by(m) |>
filter(N == max(N)) |>
slice(1) |>
ungroup() |>
mutate(m = NULL)
# A tibble: 162 × 3
automatic validated N
<chr> <chr> <int>
1 bolha appendicularia 0
2 coral_falso appendicularia 0
3 cosc_multiplo appendicularia 0
4 coscinodiscus appendicularia 0
5 detritos appendicularia 0
6 LRaw appendicularia 0
7 mult_organismos appendicularia 0
8 multiplos appendicularia 0
9 org_partes appendicularia 0
10 organismo appendicularia 2
