Note: Dates are formatted as DD.MM.
I have the closing prices for a number of companies (here: A, B, C) for a time frame (here: Jan 1st to Jan 5th). The df looks like this:
df1 <- data.frame(date = c("01.01.", "02.01.", "03.01.", "04.01.", "05.01."),
A = c(102, 103, 107, 120, 134),
B = c(94, 95, 100, 93, 90),
C = c(55, 53, 50, 51, 48))
The way I want to normalize the data is by using the z-score, so "z = (x – μ) / σ", meaning that for A on 01.01., this would be (102 - 113) / 13.85641 = -0.7938...
How do I apply this to all my observations? I'm guessing with the mutate funcation in dplyr but I can't seem to figure out how to actually do it.
CodePudding user response:
In dplyr, I think you'll need to use something like across(c(A,B,C), ...).
Just to offer an alternative method using data.table, which will update the table by reference ie. there is no need to write something like df1 <- df1 %>% ... in this situation.
library(data.table)
setDT(df1)
cols <- c("A","B","C")
df1[, (cols) := lapply(.SD, function(x) (x - mean(x))/sd(x)), .SDcols = cols]
df1
date A B C
1: 01.01. -0.8196829 -0.1096817 1.3324198
2: 02.01. -0.7464969 0.1645225 0.5921866
3: 03.01. -0.4537530 1.5355438 -0.5181632
4: 04.01. 0.4976646 -0.3838859 -0.1480466
5: 05.01. 1.5222682 -1.2064987 -1.2583965
CodePudding user response:
Actually, no package is required at all; write a function and lapply it over the respective columns.
z <- \(x) (x - mean(x)) / sd(x)
transform(df1, z=lapply(df1[-1], z))
# date A B C z.A z.B z.C
# 1 01.01. 102 94 55 -0.8196829 -0.1096817 1.3324198
# 2 02.01. 103 95 53 -0.7464969 0.1645225 0.5921866
# 3 03.01. 107 100 50 -0.4537530 1.5355438 -0.5181632
# 4 04.01. 120 93 51 0.4976646 -0.3838859 -0.1480466
# 5 05.01. 134 90 48 1.5222682 -1.2064987 -1.2583965

