Counting (character) values in a data frame in function of two variables-CodePudding

Data:

(actual data is way larger, but I just present here a representation for the sake of clarity)

I have the following vectors

Response <- c('blue', 'yellow', 'red', 'yellow', 'blue', 'red', 'red', 'blue', 'yellow', 'green', 'yellow', 'yellow', 'green')

ZN  <- c('C', 'B', 'A','C', 'B', 'A', 'C', 'B', 'A', 'B', 'A', 'B', 'A')

Stimuli <- c('e','e','e','e','e','e','e','e','e','c','c','c','c')

Organized in this data frame:

test <- data.frame(Response, ZN, Stimuli)

Explanation: stimuli "e" had three levels (C;B;A) and stimuli "c" two levels (B;A). Participants could assign certain colours to these levels.

Goal: to contabilize for stimuli "e" how many times participants choose colour blue, red, or yellow in each level (C, B, A), e.g., in the example, blue was chosen 2 times for level C, and yellow once). An the same for stimuli "c" which has two levels (B, A).

One (cumbersome) way of doing it is by summing the number of blues when ZN =='C' and Stimuli == e

sum(data$Response == 'blue' | data$ZN =='C' | data$Stimuli == e)

Then do this for colours yellow and red (the only possible ones for stimuli "e"). And then replicate this procedure for ZN =='B' and ZN =='A'.

There must be a smarter way of doing this.. I'm just new to R, sorry if the question is too silly. In the end, the information I want to have is the following:

Stimuli e

ZN = C
2 blue
1 yellow

ZN = B
2 yellow
1 red

ZN = A
2 red
1 yellow

Thank you!

CodePudding user response：

Here's a one-liner base R solution that gives the answer in a format closer to what you requested:

lapply(split(test, test$ZN), function(x) t(table(x$Stimuli, x$Response)))
#> $A
#>         
#>          c e
#>   green  1 0
#>   red    0 2
#>   yellow 1 1
#> 
#> $B
#>         
#>          c e
#>   blue   0 2
#>   green  1 0
#>   yellow 1 1
#> 
#> $C
#>         
#>          e
#>   blue   1
#>   red    1
#>   yellow 1

^{Created on 2022-01-28 by the reprex package (v2.0.1)}

CodePudding user response：

We can do this type of summary using grouping. But, with tidyverse, we can use count() which saves us a step of having to write group_by() first. Instead we just list the columns we want to group by in the count() statment.

library(tidyverse)

test %>% count(Response, Stimuli, ZN) %>% 
  arrange(Stimuli, ZN, Response)

# A tibble: 11 x 4
# Groups:   Response, Stimuli, ZN [11]
   Response Stimuli ZN        n
   <fct>    <fct>   <fct> <int>
 1 green    c       A         1
 2 yellow   c       A         1
 3 green    c       B         1
 4 yellow   c       B         1
 5 red      e       A         2
 6 yellow   e       A         1
 7 blue     e       B         2
 8 yellow   e       B         1
 9 blue     e       C         1
10 red      e       C         1
11 yellow   e       C         1

CodePudding user response：

Your goal is stated in a very specific way, so I'm not sure what problem you actually need to solve.

Sounds like you want to count the occurences of possible combinations of stimuli, ZN and responses. If that's the case, then just group over all of them and get dplyr::n to do the counting based on the grouping. This might be a useful reference.

You might need to run install.packages("tidyverse") and then library(tidyverse) for it to work.

data %>%
  dplyr::group_by(Response, ZN, Stimuli) %>%
  dplyr::summarise(
    count = dplyr::n(),
    .groups = "drop")
  tibble::as_tibble()