I have a data frame df with 5 columns. Region.Label indicates the Region where the study was performed, Sample.Label is a specific area within that Region where I counted for birds, Sp is the bird species I found in that specific area, Distance is the distance between the bird and me, and Effort is the time I was in the area looking for birds. When Distance is NA means that for this area the species was not observed. As an example of the data frame I have:
df <- data.frame(Region.Label=c("A","A","A","A","A","A","A","A"),
Sample.Label=c(1,1,1,2,2,2,3,3),
Sp=c("ZZ","ZZ","BB","ZZ","BB","CC","ZZ","BB"),
Distance=c(2,7,NA,NA,NA,6,NA,NA),
Effort=c(99,99,99,87,87,87,72,72))
df$Region.Label <- as.factor(df$Region.Label)
df$Sample.Label <- as.numeric(df$Sample.Label)
df
Region.Label Sample.Label Sp Distance Effort
1 A 1 ZZ 2 99
2 A 1 ZZ 7 99
3 A 1 BB NA 99
4 A 2 ZZ NA 87
5 A 2 BB NA 87
6 A 2 CC 6 87
7 A 3 ZZ NA 72
8 A 3 BB NA 72
Here, I would like to remove all the rows which have NA for the column df$Distance, as it indicates that the species was not observed in the area, but I want to delete rows with NA for df$Distance when that row with the NA is a duplicate of another row excluding the column df$Sp.
I would like to obtain this:
Region.Label Sample.Label Sp Distance Effort
1 A 1 ZZ 2 99
2 A 1 ZZ 7 99
3 A 2 CC 6 87
4 A 3 ZZ NA 72
In this example, I do not remove df[7,] because the Sample.Label is different from the previous rows. I remove df[8,] because df[7,] and df[8,] are equal except for df$Sp.
Does anyone know how to get what I want?
CodePudding user response:
Perhaps, a group by operation would help - grouped by 'Region.Label', 'Sample.Label', 'Effort', filter the non-NA elements of 'Distance' if there are any non-NA elements or else get the first row (row_number() == 1)
library(dplyr)
df %>%
group_by(Region.Label, Sample.Label, Effort) %>%
filter(if(all(is.na(Distance))) row_number() == 1 else !is.na(Distance)) %>%
ungroup
-output
# A tibble: 4 × 5
Region.Label Sample.Label Sp Distance Effort
<fct> <dbl> <chr> <dbl> <dbl>
1 A 1 ZZ 2 99
2 A 1 ZZ 7 99
3 A 2 CC 6 87
4 A 3 ZZ NA 72
