Context
I have a character vector a.
I want to extract the text between the last slash(/) and the .nc using the str_extract()function.
I have tried like this: str_extract(a, "(?=/).*(?=.nc)"), but failed.
Question
How can I get the text between the last lash and .nc in character vector a.
Reproducible code
a = c(
'data/temp/air/pm2.5/pm2.5_year_2014.nc',
'data/temp/air/pm10/pm10_year_2014.nc',
'efcv/asdfe/weewr/rtrkhh/ss_fef_10233_dfdfe.nc'
)
# My solution (failed)
str_extract(a, "(?=/).*(?=.nc)")
# [1] "/temp/air/pm2.5/pm2.5_year_2014"
# [2] "/temp/air/pm10/pm10_year_2014"
# [3] "/asdfe/weewr/rtrkhh/ss_fef_10233_dfdfe"
# The expected output should like this:
# [1] "pm2.5_year_2014"
# [2] "pm10_year_2014"
# [3] "ss_fef_10233_dfdfe"
CodePudding user response:
Here is a regex replacement approach:
a = c(
'data/temp/air/pm2.5/pm2.5_year_2014.nc',
'data/temp/air/pm10/pm10_year_2014.nc',
'efcv/asdfe/weewr/rtrkhh/ss_fef_10233_dfdfe.nc'
)
output <- gsub(".*/|\\.[^.] $", "", a)
output
[1] "pm2.5_year_2014" "pm10_year_2014" "ss_fef_10233_dfdfe"
Here is the regex logic:
.*/match everything from the start of the string until the last /|OR\.[^.] $match everything from final dot until the end of the string
Then we replace these matches by empty string to remove them, leaving behind the filenames.
