I am experiencing issues passing numpy.float64 variables as arguments to pandas.Series.apply(). Is there any way to forcefully use pandas version of the .mean() and .std() functions to hopefully satisfy Pandas?
The Code
def normalization(val_to_norm, col_mean, col_sd):
return (val_to_norm - col_mean) / col_sd
voting_df['pop_estimate'].info()
pop_mean, pop_sd = voting_df['pop_estimate'].mean(), voting_df['pop_estimate'].std()
voting_df['pop_estimate'] = voting_df['pop_estimate'].apply(normalization, pop_mean, pop_sd)
output
The key line is at the bottom.
<class 'pandas.core.series.Series'>
Int64Index: 3145 entries, 0 to 3144
Series name: pop_estimate
Non-Null Count Dtype
-------------- -----
3145 non-null float64
dtypes: float64(1)
memory usage: 49.1 KB
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In [46], line 7
4 voting_df['pop_estimate'].info()
6 pop_mean, pop_sd = voting_df['pop_estimate'].mean(), voting_df['pop_estimate'].std()
----> 7 voting_df['pop_estimate'] = voting_df['pop_estimate'].apply(normalization, pop_mean, pop_sd)
File c:\Users\chris\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\series.py:4774, in Series.apply(self, func, convert_dtype, args, **kwargs)
4664 def apply(
4665 self,
4666 func: AggFuncType,
(...)
4669 **kwargs,
4670 ) -> DataFrame | Series:
4671 """
4672 Invoke function on values of Series.
4673
(...)
4772 dtype: float64
4773 """
-> 4774 return SeriesApply(self, func, convert_dtype, args, kwargs).apply()
File c:\Users\chris\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\apply.py:1100, in SeriesApply.apply(self)
1097 return self.apply_str()
1099 # self.f is Callable
-> 1100 return self.apply_standard()
File c:\Users\chris\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\apply.py:1151, in SeriesApply.apply_standard(self)
1149 else:
1150 values = obj.astype(object)._values
-> 1151 mapped = lib.map_infer(
1152 values,
1153 f,
1154 convert=self.convert_dtype,
1155 )
1157 if len(mapped) and isinstance(mapped[0], ABCSeries):
1158 # GH#43986 Need to do list(mapped) in order to get treated as nested
1159 # See also GH#25959 regarding EA support
1160 return obj._constructor_expanddim(list(mapped), index=obj.index)
File c:\Users\chris\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\_libs\lib.pyx:2919, in pandas._libs.lib.map_infer()
File c:\Users\chris\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\apply.py:139, in Apply.__init__.<locals>.f(x)
138 def f(x):
--> 139 return func(x, *args, **kwargs)
TypeError: Value after * must be an iterable, not numpy.float64
CodePudding user response:
To provide additional arguments to a function called with pd.Series.apply, you need to pass them as keyword arguments, or using a tuple keyword argument args.
From the docs:
Series.apply(func, convert_dtype=True, args=(), **kwargs)Invoke function on values of Series.
Can be ufunc (a NumPy function that applies to the entire Series) or a Python function that only works on single values.
Parameters
func: function
Python function or NumPy ufunc to apply.convert_dtype: bool, default True
Try to find better dtype for elementwise function results. If False, leave as dtype=object. Note that the dtype is always preserved for some extension array dtypes, such as Categorical.args: tuple
Positional arguments passed to func after the series value.**kwargs
Additional keyword arguments passed to func.
So to call this with positional arguments:
voting_df['pop_estimate'].apply(normalization, args=(pop_mean, pop_sd))
Alternatively, with keyword arguments:
voting_df['pop_estimate'].apply(normalization, col_mean=pop_mean, col_sd=pop_sd)
CodePudding user response:
It has nothing to do with data type. You are passing pop_mean and pop_sd as positional argument and it is used by apply not normalization.
In order to pass to normalization use args or keyword arguments:
# sample data setup
voting_df = pd.DataFrame({"pop_estimate": range(3144)})
def normalization(val_to_norm, col_mean, col_sd):
return (val_to_norm - col_mean) / col_sd
pop_mean, pop_sd = voting_df['pop_estimate'].mean(), voting_df['pop_estimate'].std()
Method 1: Use args:
method1 = voting_df['pop_estimate'].apply(normalization, args=(pop_mean, pop_sd))
Method 2: Use keyword arguments:
method2 = voting_df['pop_estimate'].apply(normalization,
col_mean=pop_mean,
col_sd=pop_sd)
Besides, in your case, you don't need apply. Instead, directly use normalization:
method3 = normalization(voting_df["pop_estimate"], pop_mean, pop_sd)
Or even better, use already well built libraries. For example, scipy.stats.zscore:
from scipy.stats import zscore
method4 = zscore(voting_df["pop_estimate"], ddof=1)
Validation:
import numpy as np
np.all([
np.array_equal(method1, method2),
np.array_equal(method2, method3),
np.array_equal(method3, method4)
])
# True
