I get a compiler error if I try this
df.filter($"foo" == lit(0))
forgetting that I need a triple equals in Spark.
However, if I do this, I get the wrong answer but no error:
df.filter($"foo".between(baz, quux) || $"foo" == lit(0))
Can someone explain why compile-time checks help me in the first case, but not the second?
CodePudding user response:
Because $"foo" == lit(0) is always evaluated as Boolean = false.
So in the first case, you trying to call method filter by passing a Boolean whereas it expects a string expression or column expression. Thus you get an error.
Now in the second, case:
$"foo".between(baz, quux) || $"foo" == lit(0) is evaluated as:
(((foo >= baz) AND (foo <= quux)) OR false)
which is accepted beacause you doing an OR || between a column expression ($"foo".between(baz, quux)) and a literal boolean false.
In other words, it is interpreted as $"foo".between(baz, quux) || lit(false)
