From text based on variable named a I would like to obtain a table in which description cell will be unwrapped.
a <-
"
category variable description value
A A This is variable named as A 123
which is responsible for sth
B This is variable named as B 222.1
which is responsible for sth
else
"
Result which I want to have:
CodePudding user response:
One option to achieve your desired result would be read your variable as a fixed width file using e.g. readr::read_fwf and some additional data wrangling steps where I make use of tidyr and dplyr:
library(dplyr)
library(tidyr)
library(readr)
df <- readr::read_fwf(file = a, skip = 1)
names(df) <- unlist(df[1, ])
df <- df[-1,]
df %>%
filter(!is.na(description)) %>%
tidyr::fill(category, variable) %>%
group_by(category, variable) %>%
summarise(description = paste(description, collapse = " "), value = value[!is.na(value)])
#> `summarise()` has grouped output by 'category'. You can override using the `.groups` argument.
#> # A tibble: 2 × 4
#> # Groups: category [1]
#> category variable description value
#> <chr> <chr> <chr> <chr>
#> 1 A A This is variable named as A which is responsible for … 123
#> 2 A B This is variable named as B which is responsible for … 222.1
CodePudding user response:
This is similar to @stefans. The main difference is this way requires you to specify column_widths with readr::fwf_cols(). (Which may be an advantage or disadvantage, depending on the consistency/stability of your incoming data files.)
a <-
"category variable description value
A A This is variable named as A 123
which is responsible for sth
B This is variable named as B 222.1
which is responsible for sth
else
"
column_widths <-
readr::fwf_cols(
category = 13,
variable = 8,
description = 32,
value = 10
)
I(a) |>
readr::read_fwf(
col_positions = column_widths,
skip = 1 # Because the headers are defined in `column_widths`
) |>
tidyr::fill(category, variable) |>
dplyr::mutate(
value = as.character(value),
value = dplyr::coalesce(value, "")
) |>
dplyr::group_by(category, variable) |>
dplyr::summarize(
description = paste0(description, collapse = " "),
value = as.numeric(paste0(value, collapse = " ")),
) |>
dplyr::ungroup()
Output:
# A tibble: 2 x 4
category variable description value
<chr> <chr> <chr> <dbl>
1 A A This is variable named as ~ 123
2 A B This is variable named as ~ 222.

