I have this example of dataframe.
df <- structure(list(PC1 = c(-0.0277818345657933, -0.0342426301759117,
-0.0328199061848987, 0.0557338197779853, 0.042369402931087),
PC2 = c(-0.0149291182738773, -0.00862145986823889, -0.0101822421485786,
-0.00862630869071877, -0.00419434673647331)), row.names = c("Homo sapiens - ULAC-0968",
"Homo sapiens - ULAC-0978", "Homo sapiens - ULAC-0996", "Pan troglodytes - HTB2804",
"Pan troglodytes - HTB411"), class = "data.frame")
What I would like is to create an extra column, named Species, with the content of the row names. In this case, the factors would be only Homo sapiens and Pan troglodytes.
How could I proceed?
CodePudding user response:
library(tidyverse)
df %>%
rownames_to_column(var = 'Species') %>%
mutate(Species=sapply(strsplit(Species,split = ' -'),function(x) as.factor(x[1])))
output;
Species PC1 PC2
<fct> <dbl> <dbl>
1 Homo sapiens -0.0278 -0.0149
2 Homo sapiens -0.0342 -0.00862
3 Homo sapiens -0.0328 -0.0102
4 Pan troglodytes 0.0557 -0.00863
5 Pan troglodytes 0.0424 -0.00419
CodePudding user response:
Base R option.
Use sub to drop everything from - and delete the rownames.
df$Species <- trimws(sub('-.*', '', rownames(df)))
rownames(df) <- NULL
df
# PC1 PC2 Species
#1 -0.0278 -0.01493 Homo sapiens
#2 -0.0342 -0.00862 Homo sapiens
#3 -0.0328 -0.01018 Homo sapiens
#4 0.0557 -0.00863 Pan troglodytes
#5 0.0424 -0.00419 Pan troglodytes
