Conditionally count and discount members of multiple groups-CodePudding

In this data:

df <- structure(list(Utterance = c("how old's your mom¿", 
                                   "how old's your mom¿", 
                                   "how old's your mom¿", 
                                   "how old's your mom¿", 
                                   "how old's your mom¿", 
                                   "how old's your mom¿", 
                                   "(0.855)", "(0.855)", "(0.855)", "eh six:ty:::-one=", "eh six:ty:::-one=", 
                                   "eh six:ty:::-one=", "[when was] that¿=", "[when was] that¿=", 
                                   "[when was] that¿=", "[when was] that¿=", "[yes] (0.163) =!this! was on °Wednesday°", 
                                   "[yes] (0.163) =!this! was on °Wednesday°", "[yes] (0.163) =!this! was on °Wednesday°", 
                                   "[yes] (0.163) =!this! was on °Wednesday°"), 
                     G_by = c("A","A", "A", "C", "C", "C", "B", "B", "B", "A", "B", "C", "A", "A", 
                                  "B", "C", "A", "A", "A", "B"), 
                     G_to = c("B", "*", "C", "A", "A", "B", "C", "A", "C", "C", "A", "B", "*", "C", "A", "A", "C", "*", 
                              "C", "A")), row.names = c(NA, -20L), class = c("tbl_df", "tbl", "data.frame"))

I need to count the number of members in groups Utterance and G_to based on conditions:

discount *
discount if next character is same as prior character, e.g., CC
discount if next character is same as character prior to last *, e.g., C*C

I only manage discounting *:

df %>%
  # group:
  group_by(Utterance, G_by) %>% 
  # create new column:
  mutate(
    N_G = sum(G_to %in% c("A", "B", "C")))

The result I am after is this:

# A tibble: 20 × 4
# Groups:   Utterance, G_by [11]
   Utterance                                G_by  G_to    N_G
   <chr>                                    <chr> <chr> <int>
 1 how old's your mom¿                      A     B         2
 2 how old's your mom¿                      A     *         2
 3 how old's your mom¿                      A     C         2
 4 how old's your mom¿                      C     A         2
 5 how old's your mom¿                      C     A         2
 6 how old's your mom¿                      C     B         2
 7 (0.855)                                  B     C         3
 8 (0.855)                                  B     A         3
 9 (0.855)                                  B     C         3
10 eh six:ty:::-one=                        A     C         1
11 eh six:ty:::-one=                        B     A         1
12 eh six:ty:::-one=                        C     B         1
13 [when was] that¿=                        A     *         1
14 [when was] that¿=                        A     C         1
15 [when was] that¿=                        B     A         1
16 [when was] that¿=                        C     A         1
17 [yes] (0.163) =!this! was on °Wednesday° A     C         1
18 [yes] (0.163) =!this! was on °Wednesday° A     *         1
19 [yes] (0.163) =!this! was on °Wednesday° A     C         1
20 [yes] (0.163) =!this! was on °Wednesday° B     A         1

How can that be obtained?

CodePudding user response：

Subset the column values, use rleid and then get the n_distinct on that

library(dplyr)
library(data.table)
library(tidyr)
df %>%
   group_by(Utterance, G_by) %>%
   mutate(N_G = na_if(G_to, "*")) %>% 
   fill(N_G, .direction = 'downup') %>% 
   mutate(N_G = n_distinct(rleid(N_G))) %>%
   ungroup

-output

# A tibble: 20 × 4
   Utterance                                G_by  G_to    N_G
   <chr>                                    <chr> <chr> <int>
 1 how old's your mom¿                      A     B         2
 2 how old's your mom¿                      A     *         2
 3 how old's your mom¿                      A     C         2
 4 how old's your mom¿                      C     A         2
 5 how old's your mom¿                      C     A         2
 6 how old's your mom¿                      C     B         2
 7 (0.855)                                  B     C         3
 8 (0.855)                                  B     A         3
 9 (0.855)                                  B     C         3
10 eh six:ty:::-one=                        A     C         1
11 eh six:ty:::-one=                        B     A         1
12 eh six:ty:::-one=                        C     B         1
13 [when was] that¿=                        A     *         1
14 [when was] that¿=                        A     C         1
15 [when was] that¿=                        B     A         1
16 [when was] that¿=                        C     A         1
17 [yes] (0.163) =!this! was on °Wednesday° A     C         1
18 [yes] (0.163) =!this! was on °Wednesday° A     *         1
19 [yes] (0.163) =!this! was on °Wednesday° A     C         1
20 [yes] (0.163) =!this! was on °Wednesday° B     A         1