I am puzzled and have no idea what is happening. My script contains the following line. It is used to combine contents of three columns of a dataframe into one of them (only for rows that fulfill the specified condition):
share_data_sm[yr]['MMR']= np.where((share_data_sm[yr]['MC']!='MA') & (share_data_sm[yr]['MC']!=' ') & (share_data_sm[yr]['MY']!=' '), share_data_sm[yr]['MC'].astype(str) share_data_sm[yr]['N'].astype(str) share_data_sm[yr]['MFR'].astype(str), share_data_sm[yr]['MFR'])
'share_data_sm' is a dictionary of dataframes - one table per 'yr'. What puzzles me the most, is that the error is thrown only for one particular value of 'yr' (command is a part of a loop that goes over several values of 'yr' and except for this one particular value (2021) the script runs smoothly). I though maybe there are some peculiarities in the data contents of the 2021 dataframe, but nothing exceptional there everything is exactly as the others. The following is traceback from console:
Traceback (most recent call last):
File "…ipykernel_1380/3858926177.py", line 1, in <module>
runfile('…work_folder/Groups/Structure/shareholding.py', wdir='…_work_folder/Groups/Structure')
File "…pydevd\_pydev_bundle\pydev_umd.py", line 167, in runfile
execfile(filename, namespace)
File "…pydevd\_pydev_imps\_pydev_execfile.py", line 25, in execfile
exec(compile(contents "\n", file, 'exec'), glob, loc)
File "…work_folder/Groups/Structure/shareholding.py", line 281, in <module>
share_data_sm[yr]['MMR']= np.where((share_data_sm[yr]['MC']!='MA') & (share_data_sm[yr]['MC']!=' ') & (share_data_sm[yr]['MY']!=' '), share_data_sm[yr]['MC'].astype(str) share_data_sm[yr]['N'].astype(str) share_data_sm[yr]['MFR'].astype(str), share_data_sm[yr]['MFR'])
File "…pandas\core\ops\common.py", line 69, in new_method
return method(self, other)
File "…pandas\core\arraylike.py", line 96, in __radd__
return self._arith_method(other, roperator.radd)
File "…pandas\core\frame.py", line 6864, in _arith_method
self, other = ops.align_method_FRAME(self, other, axis, flex=True, level=None)
File "…pandas\core\ops\__init__.py", line 306, in align_method_FRAME
left, right = left.align(
File "…pandas\core\frame.py", line 4677, in align
return super().align(
File "…pandas\core\generic.py", line 8591, in align
return self._align_series(
File "…pandas\core\generic.py", line 8708, in _align_series
join_index, lidx, ridx = join_index.join(
File "…pandas\core\indexes\base.py", line 207, in join
join_index, lidx, ridx = meth(self, other, how=how, level=level, sort=sort)
File "…pandas\core\indexes\base.py", line 3987, in join
return this.join(other, how=how, return_indexers=True)
File "…pandas\core\indexes\base.py", line 207, in join
join_index, lidx, ridx = meth(self, other, how=how, level=level, sort=sort)
File "…pandas\core\indexes\base.py", line 3995, in join
return self._join_monotonic(other, how=how)
File "…pandas\core\indexes\base.py", line 4327, in _join_monotonic
join_array, lidx, ridx = self._outer_indexer(other)
File "…pandas\core\indexes\base.py", line 345, in _outer_indexer
joined_ndarray, lidx, ridx = libjoin.outer_join_indexer(sv, ov)
File "…pandas\_libs\join.pyx", line 562, in pandas._libs.join.outer_join_indexer
TypeError: '<' not supported between instances of 'str' and 'int'
I'll appreciate any help - how can I overcome the problem?
CodePudding user response:
I think I see it.
The code can be reformatted as:
condition = \
(share_data_sm[yr]['MC']!='MA') & \
(share_data_sm[yr]['MC']!=' ') & \
(share_data_sm[yr]['MY']!=' ')
val_if_true = share_data_sm[yr]['MC'].astype(str) share_data_sm[yr]['N'].astype(str) share_data_sm[yr]['MFR'].astype(str)
val_if_false = share_data_sm[yr]['MFR']
share_data_sm[yr]['MMR'] = np.where(condition, val_if_true, val_if_false)
Now you can see that the value types of val_if_true and val_if_false are different - in the first case, you add together 3 str values. In the second, you're keeping the datatype of share_data_sm[yr]['MFR'].
I bet it complains when you try to add both types into the same array.
CodePudding user response:
The traceback says the error is in the complicated
np.where((share_data_sm[yr]['MC']!='MA') & (share_data_sm[yr]['MC']!=' ') & (share_data_sm[yr]['MY']!=' '), share_data_sm[yr]['MC'].astype(str) share_data_sm[yr]['N'].astype(str) share_data_sm[yr]['MFR'].astype(str), share_data_sm[yr]['MFR'])
But keep in mind that it has to evaluate each of the 3 arguments before passing them to where.
(share_data_sm[yr]['MC']!='MA') & (share_data_sm[yr]['MC']!=' ') & (share_data_sm[yr]['MY']!=' ')
share_data_sm[yr]['MC'].astype(str) share_data_sm[yr]['N'].astype(str) share_data_sm[yr]['MFR'].astype(str)
share_data_sm[yr]['MFR']
It's a little hard to read the traceback, but the str error suggests it in the middle argument. But you are adding string values.
But I am seeing this.join and indices which suggests that it's trying to line up the indices of the series. So frame indices may be mostly strings, with an oddball numeric index. But this is just a guess; I'm not a pandas expert.
I'd suggest evaluating those 3 arguments before hand, before using them in the where to better isolate the problem. Expressions that extend over many lines are hard to debug.
