I have a column with addresses. The data is not clean and the information includes street and house number or sometimes postcode and city. I would like to move the postcode and city information to another column with R, while street and house number stay in the old place. The postcode is a 4 digit number string. I am grateful for any suggestion for a solution.


CodePudding user response:
An ifelse with grepl should help -
library(dplyr)
df <- df %>%
mutate(Strasse = ifelse(grepl('^\\d{4}', Halter), '', Halter),
Ort = ifelse(Strasse == '', Halter, ''))
# Line Halter Strasse Ort
#1 1 1007 Abc 1007 Abc
#2 2 1012 Long words 1012 Long words
#3 3 Enelbach 54 Enelbach 54
#4 4 Abcd 56 Abcd 56
#5 5 Engasse 21 Engasse 21
grepl('^\\d{4}', Halter) returns TRUE if it finds a 4-digit number at the start of the string else returns FALSE.
data
It is easier to help if you provide data in a reproducible format
df <- data.frame(Line = 1:5,
Halter = c('1007 Abc', '1012 Long words', 'Enelbach 54',
'Abcd 56', 'Engasse 21'))
CodePudding user response:
In addition to the neat solution of @Ronak Shah, if you want to use base R
df <- data.frame(Line = 1:5,
Halter = c('1007 Abc', '1012 Long words', 'Enelbach 54',
'Abcd 56', 'Engasse 21'))
df$Strasse <- with(df, ifelse(grepl('^\\d{4}', Halter), '', Halter))
df$Ort <- with(df, ifelse(Strasse == '', Halter, ''))
> head(df)
Line Halter Strasse Ort
1 1 1007 Abc 1007 Abc
2 2 1012 Long words 1012 Long words
3 3 Enelbach 54 Enelbach 54
4 4 Abcd 56 Abcd 56
5 5 Engasse 21 Engasse 21
CodePudding user response:
An option is also with separate
library(dplyr)
library(tidyr)
df %>%
separate(Halter, into = c("Strasse", "Ort"), sep = "(?<=[0-9])$|^(?=[0-9]{4} )")
Line Strasse Ort
1 1 1007 Abc
2 2 1012 Long words
3 3 Enelbach 54
4 4 Abcd 56
5 5 Engasse 21
data
df <- structure(list(Line = 1:5, Halter = c("1007 Abc", "1012 Long words",
"Enelbach 54", "Abcd 56", "Engasse 21")), class = "data.frame", row.names = c(NA,
-5L))
