Create table from wrapped text in R-CodePudding

From text based on variable named a I would like to obtain a table in which description cell will be unwrapped.

a <- 
  "
   category     variable    description                    value
   A            A           This is variable named as A    123
                            which is responsible for sth
                B           This is variable named as B    222.1 
                            which is responsible for sth
                            else
  "

Result which I want to have:

CodePudding user response：

One option to achieve your desired result would be read your variable as a fixed width file using e.g. readr::read_fwf and some additional data wrangling steps where I make use of tidyr and dplyr:

library(dplyr)
library(tidyr)
library(readr)

df <- readr::read_fwf(file = a, skip = 1)
names(df) <- unlist(df[1, ])
df <- df[-1,]
df %>% 
  filter(!is.na(description)) %>% 
  tidyr::fill(category, variable) %>% 
  group_by(category, variable) %>% 
  summarise(description = paste(description, collapse = " "), value = value[!is.na(value)])
#> `summarise()` has grouped output by 'category'. You can override using the `.groups` argument.
#> # A tibble: 2 × 4
#> # Groups:   category [1]
#>   category variable description                                            value
#>   <chr>    <chr>    <chr>                                                  <chr>
#> 1 A        A        This is variable named as A which is responsible for … 123  
#> 2 A        B        This is variable named as B which is responsible for … 222.1

CodePudding user response：

This is similar to @stefans. The main difference is this way requires you to specify column_widths with readr::fwf_cols(). (Which may be an advantage or disadvantage, depending on the consistency/stability of your incoming data files.)

a <- 
"category     variable    description                    value
A            A           This is variable named as A    123
                         which is responsible for sth
             B           This is variable named as B    222.1 
                         which is responsible for sth
                         else
"
column_widths <-
  readr::fwf_cols(
    category        = 13,
    variable        = 8,
    description     = 32,
    value           = 10
  )

I(a) |> 
  readr::read_fwf(
    col_positions = column_widths,
    skip          = 1         # Because the headers are defined in `column_widths`
  ) |> 
  tidyr::fill(category, variable) |> 
  dplyr::mutate(
    value   = as.character(value),
    value   = dplyr::coalesce(value, "")
  ) |> 
  dplyr::group_by(category, variable) |> 
  dplyr::summarize(
    description = paste0(description, collapse = " "), 
    value       = as.numeric(paste0(value, collapse = " ")), 
  ) |> 
  dplyr::ungroup()

Output:

# A tibble: 2 x 4
  category variable description                 value
  <chr>    <chr>    <chr>                       <dbl>
1 A        A        This is variable named as ~  123 
2 A        B        This is variable named as ~  222.