I have a column in DataFrame that looks like this:
| Col1 |
|---|
| A |
| B |
| A |
| C |
| B |
I want to add a boolean column that indicates for each row whether the value in that row has appeared in the previous rows. The desired output would look like this:
| Col1 | col2 |
|---|---|
| A | True |
| B | True |
| A | False |
| C | True |
| B | False |
How can I achieve it? I've tried window.expanding() with isin(), but it appears to apply to numeric columns only (mine contains strings only).
CodePudding user response:
Use Series.duplicated with invert mask by ~, alterntive solution is use DataFrame.duplicated with specify column name:
df['col2'] = ~df['Col1'].duplicated()
#alternative solution
#df['col2'] = ~df.duplicated('Col1')
print (df)
Col1 col2
0 A True
1 B True
2 A False
3 C True
4 B False
Details:
print (df['Col1'].duplicated())
0 False
1 False
2 True
3 False
4 True
Name: Col1, dtype: bool
CodePudding user response:
Just use duplicated and invert the result with ~:
df['col2'] = ~df['Col1'].duplicated()
output:
Col1 col2
0 A True
1 B True
2 A False
3 C True
4 B False
