Data Setup
I have a data set that appears somewhat like this simple dataframe below:
CAD_EXCHANGE <- 1.34
EUR_EXCHANGE <- 0.88
df <- tibble(
shipment = c("A", "B", "C", "D", "E"),
invoice = c(rep(500, 5)),
currency = factor(c("USD", "EUR", "CAD", NA, "SDD"))
)
df
# A tibble: 5 x 3
shipment invoice currency
<chr> <dbl> <fct>
1 A 500 USD
2 B 500 EUR
3 C 500 CAD
4 D 500 NA
5 E 500 SDD
levels(df$currency)
[1] "CAD" "EUR" "SDD" "USD"
End Goal
I am trying to convert the invoices to USD for some common other currencies (EUR and CAD), but not all of them or if data is missing (i.e., SDD and NA). My final data frame should look like this:
# A tibble: 5 x 5
shipment invoice currency invoice_converted currency_converted
<chr> <dbl> <fct> <dbl> <fct>
1 A 500 USD 500 USD
2 B 500 EUR 568 USD
3 C 500 CAD 373 USD
4 D 500 NA 500 NA
5 E 500 SDD 500 SDD
Trial 1 -- Does Not Work
In the future, I may have more than just these few currencies to convert, so I applied a case_when() statement. This was my first attempt:
df_USD1 <- df %>%
mutate(
invoice_converted = case_when(
currency == "EUR" ~ round(invoice / EUR_EXCHANGE),
currency == "CAD" ~ round(invoice / CAD_EXCHANGE),
TRUE ~ invoice
),
currency_converted = case_when(currency == "EUR" ~ "USD",
currency == "CAD" ~ "USD",
TRUE ~ currency)
)
Error: Problem with `mutate()` column `currency_converted`.
i `currency_converted = case_when(...)`.
x must be a character vector, not a `factor` object.
With the above, I understand that I'm mixing character and factor in the assignment to currency_converted because I have the default TRUE ~ currency (and currency is a factor). So I tried using only factors for the assignment...
Trial 2 -- Works, but not in a reliable way
df_USD2 <- df %>%
mutate(
invoice_converted = case_when(
currency == "EUR" ~ round(invoice / EUR_EXCHANGE),
currency == "CAD" ~ round(invoice / CAD_EXCHANGE),
TRUE ~ invoice
),
currency_converted = case_when(
currency == "EUR" ~ currency[1],
currency == "CAD" ~ currency[1],
TRUE ~ currency)
)
It works, but only because in my setup for this question, USD is in the first position, and I cannot rely on that.
> df$currency
[1] USD EUR CAD <NA> SDD
Levels: CAD EUR SDD USD
Trial 3 -- Doesn't work
I thought I could try some other way of getting at the factor with subsetting, but this doesn't work:
df_USD3 <- df %>%
mutate(
invoice_converted = case_when(
currency == "EUR" ~ round(invoice / EUR_EXCHANGE),
currency == "CAD" ~ round(invoice / CAD_EXCHANGE),
TRUE ~ invoice
),
currency_converted = case_when(
currency == "EUR" ~ df$currency[df$currency == "USD"],
currency == "CAD" ~ df$currency[df$currency == "USD"],
TRUE ~ currency
)
)
Error: Problem with `mutate()` column `currency_converted`.
i `currency_converted = factor(...)`.
x `currency == "EUR" ~ df$currency[df$currency == "USD"]`, `currency == "CAD" ~ df$currency[df$currency == "USD"]` must be length 5 or one, not 2.
Run `rlang::last_error()` to see where the error occurred.
And it seems to be that it's because of the NA that gets returned...
> df$currency[df$currency == "USD"]
[1] USD <NA>
Levels: CAD EUR SDD USD
...because if I go back to my original df and replace that NA with some other currency, it would work -- but obviously I need to be able to keep NA where it belongs.
I feel like there's some very good way to do this, but I'm missing it despite reading up on factors and trying different things. Help?
CodePudding user response:
case_when doesn't do the type conversion automatically - i.e. currency is factor whereas the returns from other conditions in case_when is just character. So, we can force convert the currency to character to make all the returns same class and it should work
library(dplyr)
df %>%
mutate(
invoice_converted = case_when(
currency == "EUR" ~ round(invoice / EUR_EXCHANGE),
currency == "CAD" ~ round(invoice / CAD_EXCHANGE),
TRUE ~ invoice
), currency_converted = case_when(currency == "EUR" ~ "USD",
currency == "CAD" ~ "USD",
TRUE ~ as.character(currency)))
-output
# A tibble: 5 × 5
shipment invoice currency invoice_converted currency_converted
<chr> <dbl> <fct> <dbl> <chr>
1 A 500 USD 500 USD
2 B 500 EUR 568 USD
3 C 500 CAD 373 USD
4 D 500 <NA> 500 <NA>
5 E 500 SDD 500 SDD
If we want to keep it as a factor, either wrap with factor after the case_when or directly use fct_recode instead of case_when
library(forcats)
df %>%
mutate(
invoice_converted = case_when(
currency == "EUR" ~ round(invoice / EUR_EXCHANGE),
currency == "CAD" ~ round(invoice / CAD_EXCHANGE),
TRUE ~ invoice
), currency_converted = fct_recode(currency, USD = "EUR", USD = "CAD"))
-output
# A tibble: 5 × 5
shipment invoice currency invoice_converted currency_converted
<chr> <dbl> <fct> <dbl> <fct>
1 A 500 USD 500 USD
2 B 500 EUR 568 USD
3 C 500 CAD 373 USD
4 D 500 <NA> 500 <NA>
5 E 500 SDD 500 SDD
