I have a data frame containing 27 columns. All these columns have data that has a structure similar to the one below.
principal_amt <- c('"pa": "5975.00"', '"pa": "2285.00"', '"pa": "15822.00"')
closed_accounts <- c( '"ca": 0', '"ca": 3', '"ca": 0')
status <- c(' "loan_status": "Paid" ', ' "loan_status": "Funded"',' "loan_status": "Funded"')
DF <- data.frame(principal_amt, closed_accounts)
I want to automatically remove the double quotes present in the observations so that the final data frame has a structure similar to this.
principal_amt <- c(5975.00, 2285.00, 15822.00)
closed_accounts <- c(0, 3, 0)
status <- c('Paid','Funded','Funded')
DF_Final <- data.frame(principal_amt, closed_accounts)
How do I go about this?
CodePudding user response:
The readr package ships with a handy parse_number function for such use cases.
library(tidyverse)
DF %>%
mutate(across(.fns = parse_number))
principal_amt closed_accounts
1 5975 0
2 2285 3
3 15822 0
CodePudding user response:
This will do the job.
principal_amt <- gsub("[^0-9.-]", "", c('"pa": "5975.00"', '"pa": "2285.00"', '"pa": "15822.00"'))
closed_accounts <- gsub("[^0-9.-]", "",c( '"ca": 0', '"ca": 3', '"ca": 0'))
DF <- data.frame(principal_amt, closed_accounts)
CodePudding user response:
Base R
DF <- as.data.frame(apply(
apply(DF, 2, gsub, pattern = '[^0-9.-]', replacement = ''), 2, as.numeric
))
Output
> str(DF)
'data.frame': 3 obs. of 2 variables:
$ principal_amt : num 5975 2285 15822
$ closed_accounts: num 0 3 0
