I have dataframe like below
monkey = data.frame(girl = 1:10, kn = NA, boy = 5)
And i want to understand the following code meaning step by step
monkey %>%
mutate(t = ifelse(is.na(kn),.[,grepl('a',names(.))],ll))
Thank you everyone in advance for your support.
CodePudding user response:
In my opinion, this is not good code, but I'll try to explain what it is doing.
is.na(kn)(in the context ofmonkey) returns a logical vector of whether each value in that column isNA,with(monkey, is.na(kn)) # [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUEThe use of
.in.[grepl(*)]refers to the current data at the start of this call tomutate; it would be more dplyr-canonical to usecur_data(), which would be more-complete (e.g., taking into account previous mutated columns that.does not recognize, not a factor here). I believe this.[*]code is trying to select a column dynamically based on the current data.Why this one is bad: 1. There is no column here whose name contains
"a"; 2. There could be more than one columns whose names contain"a", which means theyes=argument toifelsewould produce a nested frame in the newt=column; 3. The behavior of.[,*]changes if the original frame is the base-Rdata.frameor if it is the tibble-varianttbl_df: seemonkey[,1]versustibble(monkey)[,1].no=argument refers to an objectllthat is not defined. This should (intuitively) fail withError: object 'll' not foundor similar, but since all of thetest=argument is true, theno=is not needed and so it not evaluated. Considerifelse(c(TRUE, TRUE), 1:2, stop("oops"))(no error) versusifelse(c(TRUE, FALSE), 1:2, stop("oops")).
Ultimately, this code is not defensive-enough to be safe (base-vs-tibble variant) and its intent is unclear.
My advice when using dplyr is to use dplyr::if_else instead of base R's ifelse. For one, ifelse has some issues and limitations (e.g., How to prevent ifelse() from turning Date objects into numeric objects); for another, if_else protects you from ambiguous, inconsistent-results code such as in your question.
