I want to ask for ideas on creating a syntax to pivot_longer given on this.
I've already tried researching in the internet but I can't seem to find any examples that is similar to my data given where it has a Metric column which is also seperated in 3 different columns of months.
My desire final output is to have seven columns consisting of (regions,months, and the five Metrics)
How to formulate the pivot_longer and pivot_wider syntax to clean my data in order for me to visualize it?
CodePudding user response:
The tricky part isn't pivot_longer. You first have to clean your Excel spreadsheet, i.e. get rid of empty rows and merge the two header rows containing the names of the variables and the dates.
One approach to achieve your desired result may look like so:
library(readxl)
library(tidyr)
library(janitor)
library(dplyr)
x <- read_excel("data/Employment.xlsx", skip = 3, col_names = FALSE) %>%
# Get rid of empty rows and cols
janitor::remove_empty()
# Make column names
col_names <- data.frame(t(x[1:2,])) %>%
fill(1) %>%
unite(name, 1:2, na.rm = TRUE) %>%
pull(name)
x <- x[-c(1:2),]
names(x) <- col_names
# Convert to long and values to numerics
x %>%
pivot_longer(-Region, names_to = c(".value", "months"), names_sep = "_") %>%
separate(months, into = c("month", "year")) %>%
mutate(across(!c(Region, month, year), as.numeric))
#> # A tibble: 6 × 8
#> Region month year `Total Population … `Labor Force Part… `Employment Rat…
#> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 Philippin… April 2020f 73722. 55.7 82.4
#> 2 Philippin… Janu… 2021p 74733. 60.5 91.3
#> 3 Philippin… April 2021p 74971. 63.2 91.3
#> 4 National … April 2020f 9944. 54.2 87.7
#> 5 National … Janu… 2021p 10051. 57.2 91.2
#> 6 National … April 2021p 10084. 60.1 85.6
#> # … with 2 more variables: Unemployment Rate <dbl>, Underemployment Rate <dbl>
