When run this code,I will get an error:
genes<-colnames(survdata)[-c(1:3)]
univ_formulas<-sapply(genes,function(x)as.formula(paste('Surv(OS,status)~',x)))
Error in str2lang(x) : <text>:1:31: unexpected symbol
1: Surv(OS,status)~ ABC7-42389800N19.1
^
If I remove the element and run the code again, a similar error appears again:
univ_formulas<-sapply(genes,function(x)as.formula(paste('Surv(OS,status)~',x)))
Error in str2lang(x) : <text>:1:26: unexpected symbol
1: Surv(OS,status)~ CITF22-1A6.3
^
I don't know where the wrong is.
example of the data:
head(genes,n = 50)
[1] "A1BG" "A1BG-AS1" "A2M"
[4] "A2M-AS1" "A2ML1" "A2MP1"
[7] "A3GALT2" "A4GALT" "AAAS"
[10] "AACS" "AACSP1" "AADAT"
[13] "AAED1" "AAGAB" "AAK1"
[16] "AAMDC" "AAMP" "AANAT"
[19] "AAR2" "AARD" "AARS"
[22] "AARS2" "AARSD1" "AASDH"
[25] "AASDHPPT" "AASS" "AATF"
[28] "AATK" "AATK-AS1" "ABAT"
[31] "ABC7-42389800N19.1" "ABCA1" "ABCA10"
[34] "ABCA11P" "ABCA12" "ABCA13"
[37] "ABCA17P" "ABCA2" "ABCA3"
[40] "ABCA4" "ABCA5" "ABCA6"
[43] "ABCA7" "ABCA8" "ABCA9"
[46] "ABCB1" "ABCB10" "ABCB4"
[49] "ABCB6" "ABCB7"
CodePudding user response:
This is because the names of the genes contain - which base::str2lang regards as a mathematical expression. We can fix this as follows:
- "Clean" gene names to convert
-to_and document this somewhere.
We then have:
genes <- c("ABC7-42389800N19.1", "AATK-AS1")
sapply(genes,function(x)as.formula(paste('Surv(OS,status)~',
sub("-", "_",x))))
$`ABC7-42389800N19.1`
Surv(OS, status) ~ ABC7_42389800N19.1
<environment: 0x000002ad508b58e8>
$`AATK-AS1`
Surv(OS, status) ~ AATK_AS1
<environment: 0x000002ad508b3c30>
This is an illustration of why that is the case:
A <- 4; B<- 20
str2lang("A-B")
A - B
eval(str2lang("A-B"))
[1] -16
str2lang is essentially similar to the dreaded eval-parse framework. From the docs, this is what it does:
str2expression(s) and str2lang(s) return special versions of parse(text=s, keep.source=FALSE) and can therefore be regarded as transforming character strings s to expressions, calls, etc.
NOTE
- Since this is to be used in modeling, it is probably better to perform the
subat thecolnamesstage such that the input data to the model has the names we expect:
# not tested but you get the idea
colnames(survdata)[-c(1:3)]<-sub("-", "_",colnames(survdata)[-c(1:3)])
- It is important, for biological/research purposes, to document why gene names where cleaned as suggested in this answer.
