R: Calculating z-scores returns wrong values-CodePudding

I use this code to calculate z-scores:

data_instability <- data_2 %>%
  na.omit() %>%
  mutate(zdiff_mins = (diff_mins - mean(diff_mins))/sd(diff_mins))

However, when I inspect the data frame in Rstudio, the z-scores that are returned are clearly not correct. For example "diff_mins" of 8724.067 returns a z-score of 4,93; 9501.717 gives a z-score of 3.26. 9501.717 is higher than 8724.067 and should therefore return higher z-score.

Output from dput(data_2[,"diff_mins"]) is available here, because too long for Stack: https://docs.google.com/document/d/1OZCcNn2U0C6wkpBpEfSn316v3HhhXzu6eqxSughXhBU/edit?usp=sharing

CodePudding user response：

I found out that the resulting data frame was "grouped_df". This code fixed it:

data_instability <- data_2 %>%
  ungroup()%>%
  na.omit() %>%
  mutate(zdiff_mins = (diff_mins - mean(diff_mins))/sd(diff_mins))

CodePudding user response：

This is hard to reproduce without the actual data. When I run the following script:

data <- tibble(diff_mins = c(25579.217, 9501.717, 8724.067))

data_instability <- data %>%
mutate(zdiff_mins = (diff_mins - mean(diff_mins))/sd(diff_mins))

I get the result:

    # A tibble: 3 x 2
  diff_mins zdiff_mins
      <dbl>      <dbl>
1    25579.      1.15 
2     9502.     -0.536
3     8724.     -0.618

Where the Z-Score for 9502 is higher than for 8724. Do you by any chance use other calculations (such as taking the absolute value?) after you calculate zdiff_mins?