Home > Blockchain >  How to turn whole dataframe from string variables into numbers?
How to turn whole dataframe from string variables into numbers?

Time:01-28

I have a dataframe full of answers to a survey, so each column is filled with Never, Sometimes and Always and I need to change Never to the numeric 0, sometimes to the numeric 1 and always to the numeric 2. Is there a way to apply this change to the whole dataframe instead of individual columns?

CodePudding user response:

Suppose your data frame looks like this:

df
#>           Q1        Q2        Q3
#> 1      Never    Always    Always
#> 2     Always     Never     Never
#> 3      Never     Never     Never
#> 4  Sometimes     Never     Never
#> 5      Never Sometimes     Never
#> 6     Always Sometimes Sometimes
#> 7     Always Sometimes     Never
#> 8  Sometimes Sometimes     Never
#> 9  Sometimes    Always Sometimes
#> 10    Always     Never Sometimes

Then you can do

df[] <- sapply(df, function(x) match(x, c("Never", "Sometimes", "Always")) - 1)

Which results in

df
#>    Q1 Q2 Q3
#> 1   0  2  2
#> 2   2  0  0
#> 3   0  0  0
#> 4   1  0  0
#> 5   0  1  0
#> 6   2  1  1
#> 7   2  1  0
#> 8   1  1  0
#> 9   1  2  1
#> 10  2  0  1

Reproducible data frame

set.seed(1)
df <- replicate(3, sample(c("Never", "Sometimes", "Always"), 10, TRUE))
df <- setNames(as.data.frame(df), c("Q1", "Q2", "Q3"))

CodePudding user response:

You could convert to factor and then to numeric (using Allan Cameron's sample data):

df[] <- sapply(df, function(x) as.numeric(factor(x, levels = c("Never", "Sometimes", "Always"))) - 1)
df %>% 
  mutate(total = Q1   Q2   Q3)

   Q1 Q2 Q3 total
1   0  2  2     4
2   2  0  0     2
3   0  0  0     0
4   1  0  0     1
5   0  1  0     1
6   2  1  1     4
7   2  1  0     3
8   1  1  0     2
9   1  2  1     4
10  2  0  1     3

CodePudding user response:

Another approach could be using a named vector, probably more appropriate if you want more flexible in your translations.

set.seed(1)
df <- replicate(3, sample(c("Never", "Sometimes", "Always"), 10, TRUE))
df <- setNames(as.data.frame(df, stringsAsFactors = F), c("Q1", "Q2", "Q3"))

t <- c(0:2)
names(t) <- c("Never", "Sometimes", "Always")

as.data.frame(lapply(df, function(x) t[x]))

#    Q1 Q2 Q3
# 1   0  2  2
# 2   2  0  0
# 3   0  0  0
# 4   1  0  0
# 5   0  1  0
# 6   2  1  1
# 7   2  1  0
# 8   1  1  0
# 9   1  2  1
# 10  2  0  1

CodePudding user response:

Since no one is using a tidyverse approach, I'll add one here.

Use across(everything()) to include all columns in the dataframe.
case_when() allows you to manually specify conditions and values.

The sample data is also from Allan Cameron.

set.seed(1)
df <- replicate(3, sample(c("Never", "Sometimes", "Always"), 10, TRUE))
df <- setNames(as.data.frame(df, stringsAsFactors = F), c("Q1", "Q2", "Q3"))

df <- as_tibble(df)

# A tibble: 10 x 3
   Q1        Q2        Q3       
   <chr>     <chr>     <chr>    
 1 Never     Never     Always   
 2 Sometimes Never     Never    
 3 Sometimes Always    Sometimes
 4 Always    Sometimes Never    
 5 Never     Always    Never    
 6 Always    Sometimes Sometimes
 7 Always    Always    Never    
 8 Sometimes Always    Sometimes
 9 Sometimes Sometimes Always   
10 Never     Always    Sometimes
df %>% mutate(across(
  everything(),
  ~ case_when(.x == "Always" ~ 2L,
              .x == "Sometimes" ~ 1L,
              .x == "Never" ~ 0L)
))

# A tibble: 10 x 3
      Q1    Q2    Q3
   <int> <int> <int>
 1     0     0     2
 2     1     0     0
 3     1     2     1
 4     2     1     0
 5     0     2     0
 6     2     1     1
 7     2     2     0
 8     1     2     1
 9     1     1     2
10     0     2     1
  •  Tags:  
  • Related