I use this code to calculate z-scores:
data_instability <- data_2 %>%
na.omit() %>%
mutate(zdiff_mins = (diff_mins - mean(diff_mins))/sd(diff_mins))
However, when I inspect the data frame in Rstudio, the z-scores that are returned are clearly not correct. For example "diff_mins" of 8724.067 returns a z-score of 4,93; 9501.717 gives a z-score of 3.26. 9501.717 is higher than 8724.067 and should therefore return higher z-score.
Output from dput(data_2[,"diff_mins"]) is
available here, because too long for Stack: https://docs.google.com/document/d/1OZCcNn2U0C6wkpBpEfSn316v3HhhXzu6eqxSughXhBU/edit?usp=sharing
CodePudding user response:
I found out that the resulting data frame was "grouped_df". This code fixed it:
data_instability <- data_2 %>%
ungroup()%>%
na.omit() %>%
mutate(zdiff_mins = (diff_mins - mean(diff_mins))/sd(diff_mins))
CodePudding user response:
This is hard to reproduce without the actual data. When I run the following script:
data <- tibble(diff_mins = c(25579.217, 9501.717, 8724.067))
data_instability <- data %>%
mutate(zdiff_mins = (diff_mins - mean(diff_mins))/sd(diff_mins))
I get the result:
# A tibble: 3 x 2
diff_mins zdiff_mins
<dbl> <dbl>
1 25579. 1.15
2 9502. -0.536
3 8724. -0.618
Where the Z-Score for 9502 is higher than for 8724. Do you by any chance use other calculations (such as taking the absolute value?) after you calculate zdiff_mins?
