I have the following dataset with 4 columns:
head(L12_17)
species.2017 cooperative.2017 species.2012 cooperative.2012
1 Abrocoma cinerea no Abrocoma cinerea no
2 Acomys cineraceus no Acinonyx jubatus no
3 Acomys kempi no Acomys cahirinus no
4 Acomys louisae no Acomys cilicicus no
5 Acomys minous no Acomys ignitus no
6 Acomys percivali no Acomys kempi no
How can I save in column "species.2017" and column "species.2012" only those species that are present in both columns?
The end result will be to have a new dataset with 3 columns for "species name" "cooperative 2012" and "cooperative 2017", but I would like to keep in "species name" only those species (and their corresponding cooperative 2012 and cooperative 2017 data) that are present in "species.2017" AND "species.2012" columns. Thanks!
This is the end result I wish for:
> end.result
species cooperative.2012 cooperative.2017
1 Acomys kempi no yes
2 Acomys 22 no no
3 Acomys 444 no no
4 Addax nasomaculatus yes no
This is my current data:
> dput(head(data, 20))
structure(list(species.2017 = c("Abrocoma cinerea", "Acomys cineraceus",
"Acomys kempi", "Acomys louisae", "Acomys minous", "Acomys percivali",
"Acomys russatus", "Acomys spinosissimus", "Acomys subspinosus",
"Acomys wilsoni", "Aconaemys fuscus", "Acrobates pygmaeus", "Addax nasomaculatus",
"Aepyceros melampus", "Aethomys chrysophilus", "Aethomys hindei",
"Aethomys kaiseri", "Ailuropoda melanoleuca", "Ailurus fulgens",
"Akodon azarae"), cooperative.2017 = c("no", "no", "no", "no",
"no", "no", "no", "no", "no", "no", "no", "no", "no", "no", "no",
"no", "no", "no", "no", "no"), species.2012 = c("Abrocoma cinerea",
"Acinonyx jubatus", "Acomys cahirinus", "Acomys cilicicus", "Acomys ignitus",
"Acomys kempi", "Acomys louisae", "Acomys minous", "Acomys mullah",
"Acomys nesiotes", "Acomys percivali", "Acomys russatus", "Acomys spinosissimus",
"Acomys subspinosus", "Acomys wilsoni", "Aconaemys fuscus", "Acrobates pygmaeus",
"Addax nasomaculatus", "Aepyceros melampus", "Aethomys chrysophilus"
), cooperative.2012 = c("no", "no", "no", "no", "no", "no", "no",
"no", "no", "no", "no", "no", "no", "no", "no", "no", "no", "no",
"no", "no")), row.names = c(NA, 20L), class = "data.frame")
CodePudding user response:
so you want to keep rows where either of the species columns is a species that exists in both groups? the following code probably gets you what you need, although you have eight rows in which both species.2012 and species.2017 are in common. I'm not sure you which one you want to keep.
species.2017 <- df$species.2017
species.2012 <- df$species.2012
common <- intersect(species.2017, species.2012)
df <- df %>%
filter(species.2012 %in% common | species.2017 %in% common) %>%
mutate(species = ifelse(species.2012 %in% common, species.2012, species.2017)) %>%
select(-c(species.2012, species.2017))
CodePudding user response:
Here is a base R way.
First create a logical index of the values of species.2017 matching species.2012. Then get the final column names vector. And subset based on those two vectors.
i <- data$species.2017 %in% data$species.2012
icol <- c(j <- grep("species", names(data))[1], grep("cooperative", names(data)))
names(data)[j] <- sub("\\..*$", "", names(data)[j])
result1 <- data[i, icol]
row.names(result1) <- NULL
result1
#> species cooperative.2017 cooperative.2012
#> 1 Abrocoma cinerea no no
#> 3 Acomys kempi no no
#> 4 Acomys louisae no no
#> 5 Acomys minous no no
#> 6 Acomys percivali no no
#> 7 Acomys russatus no no
#> 8 Acomys spinosissimus no no
#> 9 Acomys subspinosus no no
#> 10 Acomys wilsoni no no
#> 11 Aconaemys fuscus no no
#> 12 Acrobates pygmaeus no no
#> 13 Addax nasomaculatus no no
#> 14 Aepyceros melampus no no
#> 15 Aethomys chrysophilus no no
Created on 2022-01-30 by the reprex package (v2.0.1)
Another way, with merge. Since all matches between the species columns must be in the final result, split the data by columns and merge the two df's. The result is identical to the result above.
tmp1 <- data[1:2]
tmp2 <- data[3:4]
result2 <- merge(tmp1, tmp2, by.x = "species.2017", by.y = "species.2012")
names(result2)[1] <- "species"
rm(tmp1, tmp2)
identical(result1, result2)
#> [1] TRUE
Created on 2022-01-30 by the reprex package (v2.0.1)
CodePudding user response:
Difficult to know what you want. Maybe turn the values in one of the columns in question into a regex alternation pattern and filter on where the two columns in question have matching values, deselect one of the two now-identical columns and rename the remaining one:
library(dplyr)
df %>%
filter(grepl(paste0("\\b(", paste0(spec1, collapse = "|"), ")\\b"), spec2)) %>%
select(-spec2) %>%
rename(spec = spec1)
spec1 spec2 smelse
1 XYZ XYZ 1
2 A A 4
Toy data:
df <- data.frame(
spec1 = c("XYZ", "QWE", "P", "A"),
spec2 = c("XYZ", "abc", "Pothead", "A"),
smelse = 1:4
)
