Hello lovely people of SO!
Guys I have the following raw dataset
ID_TRIAL<-c(1,1,1,2,3,4,5,5,5,6,6,6,7,7,8,8,8,8)
TYPE_FAIL<-c("A","B","C","F","A","A","A","B","K","T","F","A","A","B","B","Q","P","I")
| ID TRIAL | TYPE_FAIL |
|---|---|
| 1 | A |
| 1 | B |
| 1 | C |
| 2 | F |
| 3 | A |
| 4 | A |
| 5 | A |
| 5 | B |
| 5 | K |
| 6 | T |
| 6 | F |
| 6 | A |
| 7 | A |
| 7 | B |
| 8 | B |
| 8 | Q |
| 8 | P |
| 8 | I |
I need to transform this dataset in such manner that I am able to create a matrix whose columns are the TYPE OF FAILS in alphabetical order and its rows are a binary representation of all unique TYPE OF FAILS a TRIAL had for instance
all the TYPES OF FAILS are in alphabetical order: A B C F I K P Q T
So for TRAIL 8 the matrix row will look like this
| A | B | C | F | I | K | P | Q | T |
|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 |
The zeros in all other cells represent that during trial 8 for example FAIL TYPE A did not occurred and so on
my desired output would look like this:
| TRIAL | A | B | C | F | I | K | P | Q | T |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| 3 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 5 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| 6 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 |
| 7 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 8 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 |
Thank you all of you guys for helping me out I will be super attentive to read and response to all of your comments
Some of my thought-process behind my solution:
First I need to group by ID TRIAL then
I need to find a function or a routine that will look for
a letter lets say "B" and add a number one to my matrix under the column B for the
row of the TRIAL in case, I can do this using multiple ifelse lines but
my real dataset is quite large and I dont know if there is a way to perform this faster so thank you so much for helping me out on this
CodePudding user response:
Here's a tidyverse solution using dplyr::count and tidyr::pivot_wider.
library(dplyr)
library(tidyr)
df1 <- data.frame(ID_TRIAL = c(1, 1, 1, 2, 3, 4, 5, 5 , 5, 6, 6, 6, 7, 7, 8, 8, 8, 8),
TYPE_FAIL = c("A", "B", "C", "F", "A", "A", "A", "B", "K", "T", "F",
"A", "A", "B", "B", "Q", "P", "I"))
df1 %>%
count(ID_TRIAL, TYPE_FAIL) %>%
pivot_wider(names_from = "TYPE_FAIL",
values_from = "n",
names_sort = TRUE) %>%
replace(is.na(.), 0)
Result:
# A tibble: 8 × 10
ID_TRIAL A B C F I K P Q T
<dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 1 1 1 0 0 0 0 0 0
2 0 0 0 1 0 0 0 0 0
3 1 0 0 0 0 0 0 0 0
4 1 0 0 0 0 0 0 0 0
5 1 1 0 0 0 1 0 0 0
6 1 0 0 1 0 0 0 0 1
7 1 1 0 0 0 0 0 0 0
8 0 1 0 0 1 0 1 1 0
CodePudding user response:
Matrix format. Generating a matrix from the table returned values
ID_TRIAL<-c(1,1,1,2,3,4,5,5,5,6,6,6,7,7,8,8,8,8)
TYPE_FAIL<-c("A","B","C","F","A","A","A","B","K","T","F","A","A","B","B","Q","P","I")
df <- data.frame(ID_TRIAL = ID_TRIAL, TYPE_FAIL = TYPE_FAIL)
mat <- table(df) |> matrix(nrow = 8, dimnames = list(unique(df$ID_TRIAL),
sort(unique(df$TYPE_FAIL))))
A B C F I K P Q T
1 1 1 1 0 0 0 0 0 0
2 0 0 0 1 0 0 0 0 0
3 1 0 0 0 0 0 0 0 0
4 1 0 0 0 0 0 0 0 0
5 1 1 0 0 0 1 0 0 0
6 1 0 0 1 0 0 0 0 1
7 1 1 0 0 0 0 0 0 0
8 0 1 0 0 1 0 1 1 0
I thought you meant a literal matrix.
If you meant data.frame you can do. Using the table function to generate some values we can use to pivot wider
data.frame(table(df)) |>
pivot_wider(id_cols = ID_TRIAL, names_from = TYPE_FAIL, values_from = Freq)
ID_TRIAL A B C F I K P Q T
<fct> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 1 1 1 1 0 0 0 0 0 0
2 2 0 0 0 1 0 0 0 0 0
3 3 1 0 0 0 0 0 0 0 0
4 4 1 0 0 0 0 0 0 0 0
5 5 1 1 0 0 0 1 0 0 0
6 6 1 0 0 1 0 0 0 0 1
7 7 1 1 0 0 0 0 0 0 0
8 8 0 1 0 0 1 0 1 1 0
