Iterating through Dataframe and counting the words for specific values-CodePudding

I have a Dataframe with 2 Columns like that: [phrase] [category] So every phrase has a specific category. What Im trying to do is to iterate through the Dataframe and count all words for a specific category. So for Example lets say the category is news. I want find all the Phrases with the category news and count the words which are used in total.

I hope somebody can help me. Im using Python and Pandas.

Thanks

CodePudding user response：

you could do:

import pandas as pd
df = pd.DataFrame({
    "Phrases":["Hello, how are you!","I am Good!","Do you want to come over?"],
    "Category":["Question","Answer","Question"]
})
l = {}
for phrase,category in zip(df["Phrases"],df["Category"]):
    try:
        l[category].append(phrase)
    except:
        l[category] = [phrase]
print(l)

out:

{'Question': ['Hello, how are you!', 'Do you want to come over?'], 'Answer': ['I am Good!']}

CodePudding user response：

I believe you can just use the groupby function. For instance:

out = df.groupby('category').count()

As an example:

import pandas as pd
df = pd.DataFrame({'phrase': ["basketball", "football", "tennis", "bread", "honey", "nbc", "cnn", "fox", "bloomberg"],
                  'category': ["sports", "sports", "sports", "food", "food", "news", "news", "news", "news"]})


out = df.groupby('category').count()

print(out)

Output:

          phrase
category        
food           2
news           4
sports         3