'STOPWORDS' is not defined after importing stopwords-CodePudding

Here is my code, I had imported stopword, but its shows stopword is not defined.

import nltk
from nltk.corpus import stopwords
#Create stopword list:
stopwords = set(STOPWORDS)

This gives:

NameError: name 'STOPWORDS' is not defined

CodePudding user response：

You need to download the right stopwords you want to use. For example if you simply want to print the stopwords which are used in english:

import nltk
from nltk.corpus import stopwords
nltk.download('stopwords')
print(stopwords.words('english'))

This should give you the output of english stopwords like 'i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves',....]

CodePudding user response：

As pointed out earlier, the first time you need to include the following in your code, in order to download the list to your computer:

nltk.download('stopwords')

Then, you can load, for example, the English stop words list as follows:

stop_words = list(stopwords.words('english'))

and even extend it, if you need to:

stop_words.extend(["best", "item", "fast"])

Use it to remove stop words from text:

from nltk.tokenize import word_tokenize
# tokenise the text and remove stop words
word_tokens = word_tokenize(text)
clean_word_data = [w for w in word_tokens if not w.lower() in stop_words]