I have a data frame with a column 'url_text' containing text output from an OCR. I am trying to create a new column 'blocked' where the rows equal 1 if a condition is met and 0 otherwise.
df[df['url_text'].str.contains('blocked you')] # detect all rows in 'url_text' column
# that contain 'blocked you'. Code works.
I have tried to insert the above code in the following function. However, when I apply the function to my data frame I get the following error:
def f(row):
if row['url_text'] == df[df['url_text'].str.contains('blocked you')]:
val = 1
else:
val = 0
return val
df['blocked'] = df.apply(f)
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/Users/marcoliedecke/Desktop/Who_Blocks_Who?/Code/venv/lib/python3.9/site-packages/pandas/core/frame.py", line 8740, in apply
return op.apply()
File "/Users/marcoliedecke/Desktop/Who_Blocks_Who?/Code/venv/lib/python3.9/site-packages/pandas/core/apply.py", line 688, in apply
return self.apply_standard()
File "/Users/marcoliedecke/Desktop/Who_Blocks_Who?/Code/venv/lib/python3.9/site-packages/pandas/core/apply.py", line 812, in apply_standard
results, res_index = self.apply_series_generator()
File "/Users/marcoliedecke/Desktop/Who_Blocks_Who?/Code/venv/lib/python3.9/site-packages/pandas/core/apply.py", line 828, in apply_series_generator
results[i] = self.f(v)
File "<input>", line 3, in f
File "/Users/marcoliedecke/Desktop/Who_Blocks_Who?/Code/venv/lib/python3.9/site-packages/pandas/core/series.py", line 942, in __getitem__
return self._get_value(key)
File "/Users/marcoliedecke/Desktop/Who_Blocks_Who?/Code/venv/lib/python3.9/site-packages/pandas/core/series.py", line 1051, in _get_value
loc = self.index.get_loc(label)
File "/Users/marcoliedecke/Desktop/Who_Blocks_Who?/Code/venv/lib/python3.9/site-packages/pandas/core/indexes/range.py", line 388, in get_loc
raise KeyError(key)
KeyError: 'url_text'
CodePudding user response:
The root problem here is that your code compares a single string (row['url_text']) to a dataframe (df[df...])
Instead of referencing df inside your function, just use methods that are defined on the row itself. You can also implement this as a lambda function to be closer to the canonical examples.
df['blocked'] = df.apply(
lambda row: 1 if 'blocked you' in row['url_text'] else 0,
axis=1
)
