I am new to R and trying to mutate the following character variable "Tax.Rate..." into four different columns (i.e., CGST, SGST, UTGST, and IGST) with tax rates applicable to that head under that column. Example of dataset shown below:
df # A tibble: 3 x 1 Tax.Rate....
1 "CGST 2.5% SGST 2.5%" 2 "CGST 6% UTGST 6%"
3 "IGST 12% "
I have tried using 'separate' and 'mutate' functions with little success
Any guidance would be appreciated
CodePudding user response:
I'm sure this can be done concisely in base R as well, but here's a tidyverse approach where I first split the data into a new row at each plus, then trim the extra spaces, then split into two columns.
library(tidyverse)
df <- data.frame(Tax.Rate = c("CGST 2.5% SGST 2.5%", "CGST 6% UTGST 6%", "IGST 12% "))
df %>%
mutate(orig_row = row_number()) %>% # optional, for later tracking
separate_rows(Tax.Rate, sep = "\\ ") %>%
mutate(Tax.Rate = str_trim(Tax.Rate)) %>%
separate(Tax.Rate, c("group", "rate"), extra = "merge", remove = FALSE)
# A tibble: 5 × 4
Tax.Rate group rate orig_row
<chr> <chr> <chr> <int>
1 CGST 2.5% CGST 2.5% 1
2 SGST 2.5% SGST 2.5% 1
3 CGST 6% CGST 6% 2
4 UTGST 6% UTGST 6% 2
5 IGST 12% IGST 12% 3
This will produce a "long" shaped table, but if you want it "wide" with separate columns for each group (jurisdiction?) then you could add the following:
[from the end of the "separate()" line] %>%
select(-Tax.Rate) %>%
pivot_wider(names_from = group, values_from = rate)
for this result
# A tibble: 3 × 5
orig_row CGST SGST UTGST IGST
<int> <chr> <chr> <chr> <chr>
1 1 2.5% 2.5% NA NA
2 2 6% NA 6% NA
3 3 NA NA NA 12%
CodePudding user response:
We could:
- Use
separate_rowsseparating byusing\\to escape the special character - then
str_trimto remove starting space etc... separatethis column by" "4.group_byand addidto avoid nested outputpivot_wider
library(dplyr)
library(tidyr)
library(stringr)
df %>%
separate_rows(Tax.Rate, sep = "\\ ") %>%
mutate(Tax.Rate = str_trim(Tax.Rate)) %>%
separate(Tax.Rate, c("name", "value"), sep = " ") %>%
group_by(name) %>%
mutate(id = row_number()) %>%
pivot_wider(
names_from = name,
values_from = value
) %>%
select(-id)
CGST SGST UTGST IGST
<chr> <chr> <chr> <chr>
1 2.5% 2.5% 6% 12%
2 6% NA NA NA
data:
structure(list(Tax.Rate = c("CGST 2.5% SGST 2.5%", "CGST 6% UTGST 6%",
"IGST 12%")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-3L))
