I have a Dataframe with 2 Columns like that: [phrase] [category] So every phrase has a specific category. What Im trying to do is to iterate through the Dataframe and count all words for a specific category. So for Example lets say the category is news. I want find all the Phrases with the category news and count the words which are used in total.
I hope somebody can help me. Im using Python and Pandas.
Thanks
CodePudding user response:
you could do:
import pandas as pd
df = pd.DataFrame({
"Phrases":["Hello, how are you!","I am Good!","Do you want to come over?"],
"Category":["Question","Answer","Question"]
})
l = {}
for phrase,category in zip(df["Phrases"],df["Category"]):
try:
l[category].append(phrase)
except:
l[category] = [phrase]
print(l)
out:
{'Question': ['Hello, how are you!', 'Do you want to come over?'], 'Answer': ['I am Good!']}
CodePudding user response:
I believe you can just use the groupby function. For instance:
out = df.groupby('category').count()
As an example:
import pandas as pd
df = pd.DataFrame({'phrase': ["basketball", "football", "tennis", "bread", "honey", "nbc", "cnn", "fox", "bloomberg"],
'category': ["sports", "sports", "sports", "food", "food", "news", "news", "news", "news"]})
out = df.groupby('category').count()
print(out)
Output:
phrase
category
food 2
news 4
sports 3
