I want to generate a new ID column in my df based on another column my df looks something like this
> TCR <- c("CAAETSGSRLTF;CASSQEGTGVYEQYF","CGSRLTF;CASSQEGTGVYEQYF","CAAETSGSRLTF;CASSQEGT", "CAAETSGSRLTF;CASSQEGTGVYEQYF")
> df <- as.data.frame(TCR)
> df
cdr3
1 CAAETSGSRLTF;CASSQEGTGVYEQYF
2 CGSRLTF;CASSQEGTGVYEQYF
3 CAAETSGSRLTF;CASSQEGT
4 CAAETSGSRLTF;CASSQEGTGVYEQYF
I want to add a new column df$ID that looks into df$cdr3 and assigns a new character for each value, and if the value is repeated it uses the same value that was used before So it becomes something like this
>df
cdr3 ID
1 CAAETSGSRLTF;CASSQEGTGVYEQYF X1
2 CGSRLTF;CASSQEGTGVYEQYF X2
3 CAAETSGSRLTF;CASSQEGT X3
4 CAAETSGSRLTF;CASSQEGTGVYEQYF X1
Thanks a lot guys
CodePudding user response:
We can use match in base R to match the unique values in 'cdr3', get the index and paste with X
df$ID <- paste0("X", match(df$cdr3, unique(df$cdr3)))
-output
> df
cdr3 ID
1 CAAETSGSRLTF;CASSQEGTGVYEQYF X1
2 CGSRLTF;CASSQEGTGVYEQYF X2
3 CAAETSGSRLTF;CASSQEGT X3
4 CAAETSGSRLTF;CASSQEGTGVYEQYF X1
CodePudding user response:
Here is tidyverse solution with using fct_inorder from forcats package. With fct_inorder we could keep ther order in row_number()!
library(tidyverse)
tibble(cdr3) %>%
mutate(cdr3 = fct_inorder(cdr3, row_number())) %>%
mutate(ID = paste0("X", as.numeric(factor(cdr3))))
cdr3 ID
<ord> <chr>
1 CAAETSGSRLTF;CASSQEGTGVYEQYF X1
2 CGSRLTF;CASSQEGTGVYEQYF X2
3 CAAETSGSRLTF;CASSQEGT X3
4 CAAETSGSRLTF;CASSQEGTGVYEQYF X1
Warning messages:
1: Problem while computing `cdr3 =
fct_inorder(cdr3, row_number())`.
i the condition has length > 1 and only the
first element will be used
2: Problem while computing `cdr3 =
fct_inorder(cdr3, row_number())`.
i the condition has length > 1 and only the
first element will be used
