Home > Net >  Replacing labels of factor variable with added character in R
Replacing labels of factor variable with added character in R

Time:01-28

I have a factor variable similar to the one in the example data set below. There are 15 levels in my actual data set and each level is an interval. I would like to add a "K" to the end of each number (except 0) within those entires.

df <- read.table(text = "x1 x2 y
[0,60) 20 50
[0,60) 30.5 100
[120,180) 40.5 200
[120,180) 20.12 400
[120,180) 25 500
[120,180) 86 600
[540,600) 75 700
[840,900) 45 800", header = TRUE)

df$x1 <- as.factor(df$x1)

Ideal output, where each non-zero number has a "K" after it:

df <- read.table(text = "x1 x2 y
[0,60K) 20 50
[0,60K) 30.5 100
[120K,180K) 40.5 200
[120K,180K) 20.12 400
[120K,180K) 25 500
[120K,180K) 86 600
[540K,600K) 75 700
[840K,900K) 45 800", header = TRUE)

Is there any easy way to do this with grepl or something?

CodePudding user response:

Yeah, we can do it like this:

df$x1 = gsub(pattern = "([1-9][0-9]*)", replacement = "\\1K", x = df$x1)
df
#            x1    x2   y
# 1     [0,60K) 20.00  50
# 2     [0,60K) 30.50 100
# 3 [120K,180K) 40.50 200
# 4 [120K,180K) 20.12 400
# 5 [120K,180K) 25.00 500
# 6 [120K,180K) 86.00 600
# 7 [540K,600K) 75.00 700
# 8 [840K,900K) 45.00 800

The ([1-9][0-9]*) pattern matches a non-zero digit optionally followed by additional digits, so we match all numbers not starting with 0 (thus skipping your 0s, as desired).

CodePudding user response:

Replace each occurrence of , or ) with K followed by that character.

transform(df, x1 = gsub("([,)])", "K\\1", x1))

giving:

           x1    x2   y
1    [0K,60K) 20.00  50
2    [0K,60K) 30.50 100
3 [120K,180K) 40.50 200
4 [120K,180K) 20.12 400
5 [120K,180K) 25.00 500
6 [120K,180K) 86.00 600
7 [540K,600K) 75.00 700
8 [840K,900K) 45.00 800
  •  Tags:  
  • Related