Here is a representation of my dataset
mydata<-data.frame(ID=1:3, str=c("ANN_ABL_ABL","ABL", "SLE_ANN"))
I want to calculate the number of elements in the string of each observation, in order to have a dataset like below.
ID str number_of_elements
1 1 ANN_ABL_ABL 3
2 2 ABL 1
3 3 SLE_ANN 2
CodePudding user response:
A possible solution, using stringr::str_count:
library(tidyverse)
mydata<-data.frame(ID=1:3, str=c("ANN_ABL_ABL","ABL", "SLE_ANN"))
mydata %>%
mutate(n = str_count(str, "_") 1)
#> ID str n
#> 1 1 ANN_ABL_ABL 3
#> 2 2 ABL 1
#> 3 3 SLE_ANN 2
CodePudding user response:
A base R option
transform(
mydata,
number_of_elements = nchar(gsub("[^_]","",str)) 1
)
gives
ID str number_of_elements
1 1 ANN_ABL_ABL 3
2 2 ABL 1
3 3 SLE_ANN 2
CodePudding user response:
a split by row aproach within the tidyverse:
library(dplyr)
library(tidyr)
mydata %>%
tidyr::separate_rows(str, sep = "_") %>%
dplyr::count(ID, name = "number_of_elements") %>%
dplyr::left_join(mydata, by = "ID") %>%
dplyr::relocate(number_of_elements, .after = str)
# A tibble: 3 x 3
ID str number_of_elements
<int> <chr> <int>
1 1 ANN_ABL_ABL 3
2 2 ABL 1
3 3 SLE_ANN 2
CodePudding user response:
The scan function is set up to pull apart lines of text. It's default first parameter is a file name but a text parameter was added a couple of years ago. You can cook up an identical function whose first parameter is text and I also chose to make the default for the expected type of input to be "character".
scant <- function(txt, ...){scan(text=txt, what="", quiet=TRUE, ...) }
I went through those gymnastics to allow the scan* function to work within an lapply call:
lengths( lapply(mydata$str, scant, sep="_") )
I could have used an anonymous, throwaway function to do this in one line, but I decided instead to put this helper function in my .Rprofile setup. For many years I had a somewhat similar read.txt function that used a textConnection to supply character data to the read.table function. It became unnecessary when the text parameter was added to scan.
