I am trying to calculate the Z-Score of a Pandas' DataFrame, using scipy's zscore method. Though while successful, I am getting different types returned, depending on which host the program runs.
Thus I am guessing it is related to the different versions for the involved packages.
Still I haven't found the reason for the difference.
- Why do the returned
typeon the two hosts differ?
| Host 1 | Host2 |
|---|---|
| python 3.6.8 | python 3.7.3 |
| pandas 1.1.5 | pandas 1.3.1 |
| numpy 1.19.5 | numpy 1.19.2 |
| scipy 1.5.4 | scipy 1.7.3 |
Example:
Host 1
import numpy as np
import pandas as pd
from scipy.stats import zscore
df = pd.DataFrame(np.random.randint(100, 200, size=(5, 3)), columns=['A', 'B', 'C'])
# --------------------------------
In [5]: df
Out[5]:
A B C
0 166 135 141
1 156 110 167
2 104 159 114
3 150 156 157
4 163 113 180
In [10]: zscore(df)
Out[10]:
array([[ 0.80546745, 0.01940194, -0.47372066],
[ 0.36290292, -1.19321913, 0.66671797],
[-1.93843265, 1.18351816, -1.65802232],
[ 0.0973642 , 1.03800363, 0.22808773],
[ 0.67269809, -1.0477046 , 1.23693729]])
In [11]: zscore(df, ddof=0)
Out[11]:
array([[ 0.80546745, 0.01940194, -0.47372066],
[ 0.36290292, -1.19321913, 0.66671797],
[-1.93843265, 1.18351816, -1.65802232],
[ 0.0973642 , 1.03800363, 0.22808773],
[ 0.67269809, -1.0477046 , 1.23693729]])
In [12]: type(zscore(df))
Out[12]: numpy.ndarray
Host 2
import numpy as np
import pandas as pd
from scipy.stats import zscore
df = pd.DataFrame(np.random.randint(100, 200, size=(5, 3)), columns=['A', 'B', 'C'])
# --------------------------------
In [77]: df
Out[77]:
A B C
0 151 188 190
1 195 199 103
2 130 174 188
3 168 194 146
4 171 138 129
In [78]: zscore(df)
Out[78]:
A B C
0 -0.553990 0.428052 1.148875
1 1.477308 0.928963 -1.427210
2 -1.523474 -0.209472 1.089654
3 0.230829 0.701276 -0.153973
4 0.369327 -1.848819 -0.657346
In [79]: zscore(df, ddof=0)
Out[79]:
A B C
0 -0.553990 0.428052 1.148875
1 1.477308 0.928963 -1.427210
2 -1.523474 -0.209472 1.089654
3 0.230829 0.701276 -0.153973
4 0.369327 -1.848819 -0.657346
In [80]: type(zscore(df))
Out[80]: pandas.core.frame.DataFrame
CodePudding user response:
If we look at the source code of scipy's zscore in version v1.5.4 (such as on Host 1), we can see that the passed input gets converted to a numpy array using np.asanyarray(a), which is then further processed and returned. In version v1.7.3 on the other hand (such as on Host 2), the code uses the zmap function which calculates the z-score of the passed array/DataFrame while preserving its type (see this line).
In conclusion, the culprit for this behavior is the newer scipy version on Host 2. Hope this helps!
