I have a data.frame such as
data = data.frame(plot = c(1, 1, 1, 2, 2, 3, 3, 3, 3),
family = c("Fab", "Fab", "Fab", "Pip", "Fab", "Mel", "Myr", "Myr", "Fab"),
species = c("Fab", "Fab", "sp 1", "sp2", "Fab", "sp3", "sp4", "sp5", "sp1"))
What I'm trying to do is, if character names in columns family and species match by row, keep the name on family and add NA to the respective species column cell. I was trying to loop but it doesn't seem like a worthy way to do this since my data is pretty big...
CodePudding user response:
Using base R, you can assign NA to the species column after filtering for your use case:
data <- data.frame(plot = c(1, 1, 1, 2, 2, 3, 3, 3, 3),
family = c("Fab", "Fab", "Fab", "Pip", "Fab", "Mel", "Myr", "Myr", "Fab"),
species = c("Fab", "Fab", "sp 1", "sp2", "Fab", "sp3", "sp4", "sp5", "sp1"),
stringsAsFactors = FALSE)
data[data$family == data$species, ]$species <- NA
data
#> plot family species
#> 1 1 Fab <NA>
#> 2 1 Fab <NA>
#> 3 1 Fab sp 1
#> 4 2 Pip sp2
#> 5 2 Fab <NA>
#> 6 3 Mel sp3
#> 7 3 Myr sp4
#> 8 3 Myr sp5
#> 9 3 Fab sp1
CodePudding user response:
library(tidyverse)
df %>%
mutate(species = case_when(species == family ~ NA_character_,
TRUE ~ species))
# A tibble: 9 × 3
plot family species
<dbl> <chr> <chr>
1 1 Fab NA
2 1 Fab NA
3 1 Fab sp 1
4 2 Pip sp2
5 2 Fab NA
6 3 Mel sp3
7 3 Myr sp4
8 3 Myr sp5
9 3 Fab sp1
