I have a large dataset data with many non-numeric columns x1, x2, ... x30, and a numeric column y.
I would like to compute a mean absolute deviation (MAD) for y per different x1 and x2 combinations.
Say, for x1 == 'A' and x2 == 'B', I want to compute MAD for y. I did:
data %>%
group_by(x1, x2) %>%
filter(x1 == "A", x2 == "B") %>%
summarise(mad = mad(y, center = mean(y)))
However, when I compute it manually, it returns a different value:
data %>%
group_by(x1, x2) %>%
filter(x1 == "A", x2 == "B") %>%
summarise(manual_mad = sum(abs(y - mean(y)))/n())
Which one is a correct computation, and how should I tweak one or another to have the same value?
CodePudding user response:
From the documentation of ?mad:
The actual value calculated is constant * cMedian(abs(x - center)).
Indeed, with 1.4826 being the default constant value, we get the same result manually:
y = 1:10
mad(y, center = mean(y))
#[1] 3.7065
1.4826 * median(abs(y - mean(y)))
#[1] 3.7065
CodePudding user response:
Apparently you are looking for the mean absolute deviation, which is defined1 as:
MAD = Σ(|x_i - μ|)/n
mnad <- \(x, ...) mean(abs(x - mean(x, ...)), ...)
mnad(1:9)
# [1] 2.222222
The mad() function calculates the median absolute deviation.
