I'm using the iris data set to learn how to use dplyr, and am trying to create a relative frequency table that looks like this:
| Petal.Width | .1 | .2 | .3 | .4 | .5 | .6 | 1 | 1.1 | 1.2 | 1.3 | 1.4 | 1.5 | 1.6 | 1.7 | 1.8 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Species | |||||||||||||||
| setosa | 0.10 | 0.58 | 0.14 | 0.14 | 0.02 | 0.02 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| versicolor | 0 | 0 | 0 | 0 | 0 | 0 | 0.14 | 0.06 | 0.10 | 0.26 | 0.14 | 0.02 | 0.20 | 0.04 | 0.06 |
I'm struggling to group the observations by species, and then produce relative frequencies on a species by species basis.
I'm guessing it'll have to be something using group_by, mutate, and count, but the closest thing I could find online was this:
my_data %>%
group_by(Petal.Width,Species) %>%
summarise(n = n()) %>%
ungroup %>%
mutate(total = sum(n), rel.freq = n / total)
This was still not quite what I was looking for as it is the total number of observations, not the number per species.
Any help is appreciated greatly!
CodePudding user response:
You could do this in dplyr, but it's a one liner in base R:
t(apply(table(iris$Species, iris$Petal.Width), 1, function(x) x/sum(x)))
#>
#> 0.1 0.2 0.3 0.4 0.5 0.6 1 1.1 1.2 1.3 1.4 1.5 1.6
#> setosa 0.1 0.58 0.14 0.14 0.02 0.02 0.00 0.00 0.0 0.00 0.00 0.00 0.00
#> versicolor 0.0 0.00 0.00 0.00 0.00 0.00 0.14 0.06 0.1 0.26 0.14 0.20 0.06
#> virginica 0.0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.0 0.00 0.02 0.04 0.02
#>
#> 1.7 1.8 1.9 2 2.1 2.2 2.3 2.4 2.5
#> setosa 0.00 0.00 0.0 0.00 0.00 0.00 0.00 0.00 0.00
#> versicolor 0.02 0.02 0.0 0.00 0.00 0.00 0.00 0.00 0.00
#> virginica 0.02 0.22 0.1 0.12 0.12 0.06 0.16 0.06 0.06
Created on 2022-02-02 by the reprex package (v2.0.1)
CodePudding user response:
Something like this?
Not sure about the "wide" format though; I'd be inclined to keep it as long (omit the pivot_wider step).
library(dplyr)
library(tidyr)
iris %>%
count(Species, Petal.Width) %>%
group_by(Species) %>%
mutate(p = n/sum(n)) %>%
ungroup() %>%
select(-n) %>%
pivot_wider(names_from = "Petal.Width", values_from = "p")
Result:
Species `0.1` `0.2` `0.3` `0.4` `0.5` `0.6` `1` `1.1` `1.2` `1.3` `1.4` `1.5` `1.6` `1.7` `1.8` `1.9` `2` `2.1` `2.2` `2.3` `2.4` `2.5`
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 setosa 0.1 0.58 0.14 0.14 0.02 0.02 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
2 versicolor NA NA NA NA NA NA 0.14 0.06 0.1 0.26 0.14 0.2 0.06 0.02 0.02 NA NA NA NA NA NA NA
3 virginica NA NA NA NA NA NA NA NA NA NA 0.02 0.04 0.02 0.02 0.22 0.1 0.12 0.12 0.06 0.16 0.06 0.06
