Home > Software design >  How to calculate the number of elements in a string for each observation
How to calculate the number of elements in a string for each observation

Time:01-08

Here is a representation of my dataset

mydata<-data.frame(ID=1:3, str=c("ANN_ABL_ABL","ABL", "SLE_ANN"))

I want to calculate the number of elements in the string of each observation, in order to have a dataset like below.

  ID         str number_of_elements
1  1 ANN_ABL_ABL                  3
2  2         ABL                  1
3  3     SLE_ANN                  2

CodePudding user response:

A possible solution, using stringr::str_count:

library(tidyverse)

mydata<-data.frame(ID=1:3, str=c("ANN_ABL_ABL","ABL", "SLE_ANN"))

mydata %>% 
  mutate(n = str_count(str, "_")   1)

#>   ID         str n
#> 1  1 ANN_ABL_ABL 3
#> 2  2         ABL 1
#> 3  3     SLE_ANN 2

CodePudding user response:

A base R option

transform(
  mydata,
  number_of_elements = nchar(gsub("[^_]","",str)) 1
)

gives

  ID         str number_of_elements
1  1 ANN_ABL_ABL                  3
2  2         ABL                  1
3  3     SLE_ANN                  2

CodePudding user response:

a split by row aproach within the tidyverse:

library(dplyr)
library(tidyr)

mydata %>%
    tidyr::separate_rows(str, sep = "_") %>% 
    dplyr::count(ID, name = "number_of_elements") %>%
    dplyr::left_join(mydata, by = "ID") %>%
    dplyr::relocate(number_of_elements, .after = str)

# A tibble: 3 x 3
     ID str         number_of_elements
  <int> <chr>                    <int>
1     1 ANN_ABL_ABL                  3
2     2 ABL                          1
3     3 SLE_ANN                      2

CodePudding user response:

The scan function is set up to pull apart lines of text. It's default first parameter is a file name but a text parameter was added a couple of years ago. You can cook up an identical function whose first parameter is text and I also chose to make the default for the expected type of input to be "character".

scant <- function(txt, ...){scan(text=txt, what="", quiet=TRUE, ...) }

I went through those gymnastics to allow the scan* function to work within an lapply call:

lengths( lapply(mydata$str, scant, sep="_") )

I could have used an anonymous, throwaway function to do this in one line, but I decided instead to put this helper function in my .Rprofile setup. For many years I had a somewhat similar read.txt function that used a textConnection to supply character data to the read.table function. It became unnecessary when the text parameter was added to scan.

  •  Tags:  
  • Related