How can I best use dplyr to subset data and create relative frequency tables?-CodePudding

I'm using the iris data set to learn how to use dplyr, and am trying to create a relative frequency table that looks like this:

Petal.Width	.1	.2	.3	.4	.5	.6	1	1.1	1.2	1.3	1.4	1.5	1.6	1.7	1.8
Species
setosa	0.10	0.58	0.14	0.14	0.02	0.02	0	0	0	0	0	0	0	0	0
versicolor	0	0	0	0	0	0	0.14	0.06	0.10	0.26	0.14	0.02	0.20	0.04	0.06

I'm struggling to group the observations by species, and then produce relative frequencies on a species by species basis.

I'm guessing it'll have to be something using group_by, mutate, and count, but the closest thing I could find online was this:

my_data %>% 
    group_by(Petal.Width,Species) %>% 
    summarise(n = n()) %>%
    ungroup %>% 
    mutate(total = sum(n), rel.freq = n / total)

This was still not quite what I was looking for as it is the total number of observations, not the number per species.

Any help is appreciated greatly!

CodePudding user response：

You could do this in dplyr, but it's a one liner in base R:

t(apply(table(iris$Species, iris$Petal.Width), 1, function(x) x/sum(x)))
#>             
#>              0.1  0.2  0.3  0.4  0.5  0.6    1  1.1 1.2  1.3  1.4  1.5  1.6
#>   setosa     0.1 0.58 0.14 0.14 0.02 0.02 0.00 0.00 0.0 0.00 0.00 0.00 0.00
#>   versicolor 0.0 0.00 0.00 0.00 0.00 0.00 0.14 0.06 0.1 0.26 0.14 0.20 0.06
#>   virginica  0.0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.0 0.00 0.02 0.04 0.02
#>             
#>               1.7  1.8 1.9    2  2.1  2.2  2.3  2.4  2.5
#>   setosa     0.00 0.00 0.0 0.00 0.00 0.00 0.00 0.00 0.00
#>   versicolor 0.02 0.02 0.0 0.00 0.00 0.00 0.00 0.00 0.00
#>   virginica  0.02 0.22 0.1 0.12 0.12 0.06 0.16 0.06 0.06

^{Created on 2022-02-02 by the reprex package (v2.0.1)}

CodePudding user response：

Something like this?

Not sure about the "wide" format though; I'd be inclined to keep it as long (omit the pivot_wider step).

library(dplyr)
library(tidyr)

iris %>% 
  count(Species, Petal.Width) %>% 
  group_by(Species) %>% 
  mutate(p = n/sum(n)) %>% 
  ungroup() %>% 
  select(-n) %>% 
  pivot_wider(names_from = "Petal.Width", values_from = "p")

Result:

Species    `0.1` `0.2` `0.3` `0.4` `0.5` `0.6`   `1` `1.1` `1.2` `1.3` `1.4` `1.5` `1.6` `1.7` `1.8` `1.9`   `2` `2.1` `2.2` `2.3` `2.4` `2.5`
  <fct>      <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 setosa       0.1  0.58  0.14  0.14  0.02  0.02 NA    NA     NA   NA    NA    NA    NA    NA    NA     NA   NA    NA    NA    NA    NA    NA   
2 versicolor  NA   NA    NA    NA    NA    NA     0.14  0.06   0.1  0.26  0.14  0.2   0.06  0.02  0.02  NA   NA    NA    NA    NA    NA    NA   
3 virginica   NA   NA    NA    NA    NA    NA    NA    NA     NA   NA     0.02  0.04  0.02  0.02  0.22   0.1  0.12  0.12  0.06  0.16  0.06  0.06