I have the following DataFrame with the columns low_scarcity and high_scarcity (a value is either on high or low scarcity):
| id | low_scarcity | high_scarcity |
|---|---|---|
| 0 | When I was five.. | |
| 1 | I worked a lot... | |
| 2 | I went to parties... | |
| 3 | 1 week ago | |
| 4 | 2 months ago | |
| 5 | another story.. |
I want to create another column 'target' that when there's an entry in low_scarcity column, the value will be 0, and when there's an entry in high_scarcity column, the value will be 1. Just like this:
| id | low_scarcity | high_scarcity | target |
|---|---|---|---|
| 0 | When I was five.. | 0 | |
| 1 | I worked a lot... | 1 | |
| 2 | I went to parties... | 1 | |
| 3 | 1 week ago | 0 | |
| 4 | 2 months ago | 0 | |
| 5 | another story.. | 1 |
I tried first replacing the entries with no value with 0 and then create a boolean condition, however I can't use .replace('',0) because the columns that are empty don't appear as empty values.
CodePudding user response:
Supposing your dataframe is called df and that a value is either on on high or low scarcity, the following line of code does it
import numpy as np
df['target'] = 1*np.array(df['high_scarcity']!="")
in which the 1* performs an integer conversion of the boolean values.
If that is not the case, then a more complex approach should be taken
res = np.array(["" for i in range(df.shape[0])])
res[df['high_scarcity']!=""] = 1
res[df['low_scarcity']!=""] = 0
df['target'] = res
