I have a tab-delimited text file with a series of timestamped data. I've read it into R using read.delim() and it gives me all the data as characters in a single column. Example:
df <- data.frame(c("2017","A","B","C","2018","X","Y","Z","2018","X","B","C"))
colnames(df) <- "col1"
df
I want to convert every n # of rows (in this case 4) to columns and stack them without using a for loop. Desired result:
col1 <- c("2017","2018","2018")
col2 <- c("A","X","X")
col3 <- c("B","Y","B")
col4 <- c("C","Z","C")
df2 <- data.frame(col1, col2, col3, col4)
df2
I created a for loop, but it can't handle the millions of rows in my df. Should I convert to a matrix? Would converting to a list help? I tried as.matrix(read.table()) and unlist() but without success.
CodePudding user response:
You could use tidyr to reshape data into the form you want, you will first need to mutate the data as to identify which indexes should be first, and which go with a specific column.
Assuming you know there are 4 groups (n = 4) you could do something like the following with the help of the dplyr package.
library(tidyr)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
n <- 4
df <- data.frame(x = c("2017","A","B","C","2018","X","Y","Z","2018","X","B","C")) %>%
mutate(cols = rep(1:n, n()/n),
id = rep(1:(n()/n), each = n))
pivot_wider(df, id_cols = id, names_from = cols, values_from = x, names_prefix = "cols")
#> # A tibble: 3 × 5
#> id cols1 cols2 cols3 cols4
#> <int> <chr> <chr> <chr> <chr>
#> 1 1 2017 A B C
#> 2 2 2018 X Y Z
#> 3 3 2018 X B C
Or, in base you could use the split function on the vector, and then use do.call to make the data frame
df <- data.frame(x = c("2017","A","B","C","2018","X","Y","Z","2018","X","B","C"))
split_df <- setNames(split(df$x, rep(1:4, 3)), paste0("cols", 1:4))
do.call("data.frame", split_df)
#> cols1 cols2 cols3 cols4
#> 1 2017 A B C
#> 2 2018 X Y Z
#> 3 2018 X B C
Created on 2022-02-01 by the reprex package (v2.0.1)
CodePudding user response:
The easiest way would be to create a matrix with matrix(ncol=x, byrow=TRUE), then convert back to data.frame. Should be quite fast too.
df |>
unlist() |>
matrix(ncol=4, byrow = TRUE) |>
as.data.frame() |>
setNames(paste0('col', 1:4))
col1 col2 col3 col4
1 2017 A B C
2 2018 X Y Z
3 2018 X B C
