Home > Enterprise >  Move information to new column if the first value of the cell is a four-digit number
Move information to new column if the first value of the cell is a four-digit number

Time:01-08

I have a column with addresses. The data is not clean and the information includes street and house number or sometimes postcode and city. I would like to move the postcode and city information to another column with R, while street and house number stay in the old place. The postcode is a 4 digit number string. I am grateful for any suggestion for a solution.

Table right now

Desired outcome

CodePudding user response:

An ifelse with grepl should help -

library(dplyr)

df <- df %>%
  mutate(Strasse = ifelse(grepl('^\\d{4}', Halter), '', Halter), 
         Ort = ifelse(Strasse == '', Halter, ''))

#  Line          Halter     Strasse             Ort
#1    1        1007 Abc                    1007 Abc
#2    2 1012 Long words             1012 Long words
#3    3     Enelbach 54 Enelbach 54                
#4    4         Abcd 56     Abcd 56                
#5    5      Engasse 21  Engasse 21                

grepl('^\\d{4}', Halter) returns TRUE if it finds a 4-digit number at the start of the string else returns FALSE.

data

It is easier to help if you provide data in a reproducible format

df <- data.frame(Line = 1:5, 
                 Halter = c('1007 Abc', '1012 Long words', 'Enelbach 54', 
                            'Abcd 56', 'Engasse 21'))

CodePudding user response:

In addition to the neat solution of @Ronak Shah, if you want to use base R

df <- data.frame(Line = 1:5, 
                 Halter = c('1007 Abc', '1012 Long words', 'Enelbach 54', 
                            'Abcd 56', 'Engasse 21'))
df$Strasse <- with(df, ifelse(grepl('^\\d{4}', Halter), '', Halter))
df$Ort <- with(df, ifelse(Strasse == '', Halter, ''))
> head(df)
  Line          Halter     Strasse             Ort
1    1        1007 Abc                    1007 Abc
2    2 1012 Long words             1012 Long words
3    3     Enelbach 54 Enelbach 54                
4    4         Abcd 56     Abcd 56                
5    5      Engasse 21  Engasse 21                

CodePudding user response:

An option is also with separate

library(dplyr)
library(tidyr)
df %>% 
  separate(Halter, into = c("Strasse", "Ort"), sep = "(?<=[0-9])$|^(?=[0-9]{4} )")
  Line     Strasse             Ort
1    1                    1007 Abc
2    2             1012 Long words
3    3 Enelbach 54                
4    4     Abcd 56                
5    5  Engasse 21                

data

df <- structure(list(Line = 1:5, Halter = c("1007 Abc", "1012 Long words", 
"Enelbach 54", "Abcd 56", "Engasse 21")), class = "data.frame", row.names = c(NA, 
-5L))
  •  Tags:  
  • Related