I want to calculate the numbers of occurrences of list values in a pandas column
lst = ['place','wait','ok','amazing','beautiful']
| ID | TEXT |
|---|---|
| 1 | beautiful place ,me |
| 1 | ok ,good work |
| 2 | wait for me ,ok |
| 2 | amazing place |
| 3 | amazing day |
| 3 | amazing country |
| 3 | amazing world |
| 3 | thank you |
the output should be like
| ID | OCCURENCES |
|---|---|
| 1 | 2 |
| 1 | 1 |
| 2 | 2 |
| 2 | 2 |
| 3 | 1 |
| 3 | 1 |
| 3 | 1 |
| 3 | 0 |
my solution :
df['occurences'] =pd.DataFrame([df['text'].str.count(c) for c in list]).sum()
CodePudding user response:
split the words and use a set intersection for efficiency:
lst = ['place','wait','ok','amazing','beautiful']
words = set(lst)
df['OCCURENCES'] = [len(words.intersection(x)) for x in df['TEXT'].str.split('\W ')]
output:
ID TEXT OCCURENCES
0 1 beautiful place ,me 2
1 1 ok ,good work 1
2 2 wait for me ,ok 2
3 2 amazing place 2
4 3 amazing day 1
5 3 amazing country 1
6 3 amazing world 1
7 3 thank you 0
