Create factorial column based on same text in rownames in a dataframe in R-CodePudding

I have this example of dataframe.

df <- structure(list(PC1 = c(-0.0277818345657933, -0.0342426301759117, 
-0.0328199061848987, 0.0557338197779853, 0.042369402931087), 
    PC2 = c(-0.0149291182738773, -0.00862145986823889, -0.0101822421485786, 
    -0.00862630869071877, -0.00419434673647331)), row.names = c("Homo sapiens - ULAC-0968", 
"Homo sapiens - ULAC-0978", "Homo sapiens - ULAC-0996", "Pan troglodytes - HTB2804", 
"Pan troglodytes - HTB411"), class = "data.frame")

What I would like is to create an extra column, named Species, with the content of the row names. In this case, the factors would be only Homo sapiens and Pan troglodytes.

How could I proceed?

CodePudding user response：

library(tidyverse)

df %>%
rownames_to_column(var = 'Species') %>%
mutate(Species=sapply(strsplit(Species,split = ' -'),function(x) as.factor(x[1])))

output;

 Species             PC1      PC2
  <fct>             <dbl>    <dbl>
1 Homo sapiens    -0.0278 -0.0149 
2 Homo sapiens    -0.0342 -0.00862
3 Homo sapiens    -0.0328 -0.0102 
4 Pan troglodytes  0.0557 -0.00863
5 Pan troglodytes  0.0424 -0.00419

CodePudding user response：

Base R option.

Use sub to drop everything from - and delete the rownames.

df$Species <- trimws(sub('-.*', '', rownames(df)))
rownames(df) <- NULL
df

#      PC1      PC2         Species
#1 -0.0278 -0.01493    Homo sapiens
#2 -0.0342 -0.00862    Homo sapiens
#3 -0.0328 -0.01018    Homo sapiens
#4  0.0557 -0.00863 Pan troglodytes
#5  0.0424 -0.00419 Pan troglodytes