When a file is read through fread, the columns may be read as integer64 (correctly so), but when these are multiplied with numeric, they are not upcasted to numeric (as in C or integers in R). While this is a documented behavior in bit64 package. But it is not intuitive, when numbers are multiplied etc. integer64 behaves differently compared to integer.
Also, integer64 when divided against integer gives a numeric variable. So the behavior is very bizarre !
Should we then always fread using colClasses = numeric for columns to be used in arithmeric expressions with numeric etc ?
file contents
x,y
111,0.3
2147483648,0.3
> d <- fread(file)
> print(d$x*d$y)
x y
1: 111 0.3
2: 2147483648 0.3
> as.integer64(111) * 8e-2
integer64
[1] 9
> as.integer64(111) * 8 / 1e2
8.88
Similarly, quantiles and other R functions will not behave correctly with integer64. This issue creeps into all classes that use integer64 like nanotime
CodePudding user response:
This is the documented behaviour of bit64 package, see Arithmetic precision and coercion in ?bit64:
The fact that we introduce 64 bit long long integers – without introducing 128-bit long doubles – creates some subtle challenges
The multiplication operator * coerces its first argument to integer64 but allows its second argument to be also double: the second argument is internaly coerced to 'long double' and the result of the multiplication is returned as integer64
as.integer64(111) * 8e-2
integer64
[1] 9
The division / and power ^ operators also coerce their first argument to integer64 and coerce internally their second argument to 'long double', they return as double
as.integer64(111) * 8 / 1e2
8.88
To avoid this, you could set integer64 parameter of fread to "double". To be used with care as there is an open issue.
