I have a data frame with exam questions organized as you see below in input.
I'm trying to organize it in a tidy way as displayed in output.
In input you can see student's ID, their answer to the specific item suffix = ".text", their score for that specific item suffix = ".score", and their total score.
Input
library(tibble)
input <- tribble(
~ID, ~i1.text, ~i1.score, ~i2.text, ~i2.score, ~total,
"Mark", "A", 0L, "B", 1L, 1L,
"Mary", "C", 1L, "D", 0L, 1L,
"John", "A", 0L, "B", 1L, 1L,
"Abby", "C", 1L, "B", 1L, 2L
)
I need to pivot the data to look like the one below.
I'm quite certain that I can do that with pivot_longer() but I'm quite lost.
Expected Output
output <- tribble(
~ID, ~item, ~text, ~score, ~total,
"Mark", "i1", "A", 0L, 1L,
"Mark", "i2", "B", 1L, 1L,
"Mary", "i1", "C", 1L, 1L,
"Mary", "i2", "D", 0L, 1L,
"John", "i1", "A", 0L, 1L,
"John", "i2", "B", 1L, 1L,
"Abby", "i1", "C", 1L, 2L,
"Abby", "i2", "B", 2L, 2L
)
output
# A tibble: 8 × 5
ID item text score total
<chr> <chr> <chr> <int> <int>
1 Mark i1 A 0 1
2 Mark i2 B 1 1
3 Mary i1 C 1 1
4 Mary i2 D 0 1
5 John i1 A 0 1
6 John i2 B 1 1
7 Abby i1 C 1 2
8 Abby i2 B 2 2
CodePudding user response:
We can use pivot_longer with names_sep as . - the column 'item' return the prefix part of the column names before the . and the .value will return the values of the column with the suffix part of the column name after the .
library(tidyr)
pivot_longer(input, cols = contains("."),
names_to = c("item", ".value"), names_sep = "\\.")
-output
# A tibble: 8 × 5
ID total item text score
<chr> <int> <chr> <chr> <int>
1 Mark 1 i1 A 0
2 Mark 1 i2 B 1
3 Mary 1 i1 C 1
4 Mary 1 i2 D 0
5 John 1 i1 A 0
6 John 1 i2 B 1
7 Abby 2 i1 C 1
8 Abby 2 i2 B 1
CodePudding user response:
Here is an similar but alternative approach using names_pattern:
Explanation: (.*)\\.(.*) is regex:
() captures a group
. captures any character
* is a quantifier means 0 or more
\\ Escape character
The regular expression means: any amount of character followed by a dot followed by any amount of character. This regex matches your column names:
library(dplyr)
library(tidyr)
input %>%
pivot_longer(
cols = -c(ID, total),
names_to = c('item', '.value'),
names_pattern = '(.*)\\.(.*)'
)
ID total item text score
<chr> <int> <chr> <chr> <int>
1 Mark 1 i1 A 0
2 Mark 1 i2 B 1
3 Mary 1 i1 C 1
4 Mary 1 i2 D 0
5 John 1 i1 A 0
6 John 1 i2 B 1
7 Abby 2 i1 C 1
8 Abby 2 i2 B 1
