I'm a bit new to R and programming in general. I have to clean a lot of data, and often it's a similar issue in multiple columns. So, I would like to use a loop, rather than writing out each line of code. I have data similar to this:
black <- c("1.33%", "9.22%", "10.71%")
white <- c("5.23%", "8.12%", "11.72%")
day <- c("Wednesday", "Thursday", "Friday")
blue <- c("2.21%", "1.12%", "8.79%")
df <- data.frame(black, white, day, blue)
This gets me a dataframe like this:
black white day blue
1 1.33% 5.23% Wednesday 2.21%
2 9.22% 8.12% Thursday 1.12%
3 10.71% 11.72% Friday 8.79%
I have read that there are 'for' loops, and also that the apply() family work like loops in R too... How would I loop through the variables black, white and blue (but not day) so that I can:
- remove the % sign
- change type from char to numeric
- round to 1 decimal place?
Like I say, I would like to know how to write this as both a for loop and apply. To remove the % sign I have used mutate and gsub before...
Thanks for your suggestions, particularly helping me to write legible code! Best, Roger
CodePudding user response:
Here is one tidy way using dplyr
library(dplyr)
clean_my_data<-function(input){
gsub("%", "", input) %>% as.numeric() %>% round(1)
}
df_new<-df %>%
mutate(across(c(black,white,blue), clean_my_data))
df_new
#> black white day blue
#> 1 1.3 5.2 Wednesday 2.2
#> 2 9.2 8.1 Thursday 1.1
#> 3 10.7 11.7 Friday 8.8
Created on 2022-01-15 by the reprex package (v2.0.1)
CodePudding user response:
this is a quick and dirty way of doing it and it can be improved!
First you need a function that do the job then you apply that function (or you do a loop it is up to you).
clean_color <- function(x) {
# just remove the last char, it can fail on data like that "1.38% "
without_percent = substr(x,
start = 1,
stop = nchar(x) - 1)
# second part convert in mun and round it
round(as.numeric(without_percent),1)
}
Then you apply this function:
sapply(df[,c(1:2,4)], clean_color)
