Home > Mobile >  Creating a new column using string match and based on if-else condition
Creating a new column using string match and based on if-else condition

Time:01-04

I have a data frame with a column 'url_text' containing text output from an OCR. I am trying to create a new column 'blocked' where the rows equal 1 if a condition is met and 0 otherwise.

df[df['url_text'].str.contains('blocked you')] # detect all rows in 'url_text' column 
# that contain 'blocked you'. Code works.  

I have tried to insert the above code in the following function. However, when I apply the function to my data frame I get the following error:

def f(row):
    if row['url_text'] == df[df['url_text'].str.contains('blocked you')]:
        val = 1
    else:
        val = 0
    return val
df['blocked'] = df.apply(f)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/Users/marcoliedecke/Desktop/Who_Blocks_Who?/Code/venv/lib/python3.9/site-packages/pandas/core/frame.py", line 8740, in apply
    return op.apply()
  File "/Users/marcoliedecke/Desktop/Who_Blocks_Who?/Code/venv/lib/python3.9/site-packages/pandas/core/apply.py", line 688, in apply
    return self.apply_standard()
  File "/Users/marcoliedecke/Desktop/Who_Blocks_Who?/Code/venv/lib/python3.9/site-packages/pandas/core/apply.py", line 812, in apply_standard
    results, res_index = self.apply_series_generator()
  File "/Users/marcoliedecke/Desktop/Who_Blocks_Who?/Code/venv/lib/python3.9/site-packages/pandas/core/apply.py", line 828, in apply_series_generator
    results[i] = self.f(v)
  File "<input>", line 3, in f
  File "/Users/marcoliedecke/Desktop/Who_Blocks_Who?/Code/venv/lib/python3.9/site-packages/pandas/core/series.py", line 942, in __getitem__
    return self._get_value(key)
  File "/Users/marcoliedecke/Desktop/Who_Blocks_Who?/Code/venv/lib/python3.9/site-packages/pandas/core/series.py", line 1051, in _get_value
    loc = self.index.get_loc(label)
  File "/Users/marcoliedecke/Desktop/Who_Blocks_Who?/Code/venv/lib/python3.9/site-packages/pandas/core/indexes/range.py", line 388, in get_loc
    raise KeyError(key)
KeyError: 'url_text'

CodePudding user response:

The root problem here is that your code compares a single string (row['url_text']) to a dataframe (df[df...])

Instead of referencing df inside your function, just use methods that are defined on the row itself. You can also implement this as a lambda function to be closer to the canonical examples.

df['blocked'] = df.apply(
    lambda row: 1 if 'blocked you' in row['url_text'] else 0,
    axis=1
)
  •  Tags:  
  • Related