I am trying to make all my data frames in r have the same levels in a categorical column so that, when I make barplots of them all, they are comparable with some having "unused factors" of frequency 0.
Currently I have multiple, separate data frames including a global data frame, then several broken down by region. Each one has a category column, then a frequency column. I have one "global" data frame with all the categories, but each of the regional data frames only have counts of certain categories found there. For example...
Global DF
| category | frequency |
|---|---|
| red | 2 |
| orange | 4 |
| yellow | 7 |
| green | 1 |
| blue | 4 |
| purple | 4 |
Current West Region DF
| category | frequency |
|---|---|
| orange | 2 |
| blue | 1 |
| purple | 3 |
Desired West Region DF
| category | frequency |
|---|---|
| red | 0 |
| orange | 2 |
| yellow | 0 |
| green | 0 |
| blue | 1 |
| purple | 3 |
This is all based on the original dataset which looks like:
| Region | Category |
|---|---|
| West | orange |
| West | orange |
| West | blue |
| West | purple |
| West | purple |
| West | purple |
| North | red |
| North | yellow |
| ... | ... |
I'm currently using ddply to create the regional DFs, but I can't figure out how to maintain categories of frequency = 0 in each one (as exemplified in the Desired West Regional DF above). Thanks for any insight!
CodePudding user response:
You could convert Category to a factor and make the counts using dplyr's count which has a .drop option allowing you to keep empty categories:
I.e.
library(dplyr)
df |>
mutate(Category = as.factor(Category)) |>
count(Region, Category, .drop = FALSE) |>
filter(Region == "West")
Output:
# A tibble: 6 × 3
Region Category n
<chr> <fct> <int>
1 West blue 1
2 West green 0
3 West orange 2
4 West purple 3
5 West red 0
6 West yellow 0
Data:
library(readr)
df <- read_table("Region Category
West orange
West orange
West blue
West purple
West purple
West purple
North red
North yellow
North green")
CodePudding user response:
Using base R
subset(as.data.frame(table(df)), Region == "West")
Region Category Freq
2 West blue 1
4 West green 0
6 West orange 2
8 West purple 3
10 West red 0
12 West yellow 0
