I have a factor variable similar to the one in the example data set below. There are 15 levels in my actual data set and each level is an interval. I would like to add a "K" to the end of each number (except 0) within those entires.
df <- read.table(text = "x1 x2 y
[0,60) 20 50
[0,60) 30.5 100
[120,180) 40.5 200
[120,180) 20.12 400
[120,180) 25 500
[120,180) 86 600
[540,600) 75 700
[840,900) 45 800", header = TRUE)
df$x1 <- as.factor(df$x1)
Ideal output, where each non-zero number has a "K" after it:
df <- read.table(text = "x1 x2 y
[0,60K) 20 50
[0,60K) 30.5 100
[120K,180K) 40.5 200
[120K,180K) 20.12 400
[120K,180K) 25 500
[120K,180K) 86 600
[540K,600K) 75 700
[840K,900K) 45 800", header = TRUE)
Is there any easy way to do this with grepl or something?
CodePudding user response:
Yeah, we can do it like this:
df$x1 = gsub(pattern = "([1-9][0-9]*)", replacement = "\\1K", x = df$x1)
df
# x1 x2 y
# 1 [0,60K) 20.00 50
# 2 [0,60K) 30.50 100
# 3 [120K,180K) 40.50 200
# 4 [120K,180K) 20.12 400
# 5 [120K,180K) 25.00 500
# 6 [120K,180K) 86.00 600
# 7 [540K,600K) 75.00 700
# 8 [840K,900K) 45.00 800
The ([1-9][0-9]*) pattern matches a non-zero digit optionally followed by additional digits, so we match all numbers not starting with 0 (thus skipping your 0s, as desired).
CodePudding user response:
Replace each occurrence of , or ) with K followed by that character.
transform(df, x1 = gsub("([,)])", "K\\1", x1))
giving:
x1 x2 y
1 [0K,60K) 20.00 50
2 [0K,60K) 30.50 100
3 [120K,180K) 40.50 200
4 [120K,180K) 20.12 400
5 [120K,180K) 25.00 500
6 [120K,180K) 86.00 600
7 [540K,600K) 75.00 700
8 [840K,900K) 45.00 800
