Here is my code, I had imported stopword, but its shows stopword is not defined.
import nltk
from nltk.corpus import stopwords
#Create stopword list:
stopwords = set(STOPWORDS)
This gives:
NameError: name 'STOPWORDS' is not defined
CodePudding user response:
You need to download the right stopwords you want to use. For example if you simply want to print the stopwords which are used in english:
import nltk
from nltk.corpus import stopwords
nltk.download('stopwords')
print(stopwords.words('english'))
This should give you the output of english stopwords like 'i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves',....]
CodePudding user response:
As pointed out earlier, the first time you need to include the following in your code, in order to download the list to your computer:
nltk.download('stopwords')
Then, you can load, for example, the English stop words list as follows:
stop_words = list(stopwords.words('english'))
and even extend it, if you need to:
stop_words.extend(["best", "item", "fast"])
Use it to remove stop words from text:
from nltk.tokenize import word_tokenize
# tokenise the text and remove stop words
word_tokens = word_tokenize(text)
clean_word_data = [w for w in word_tokens if not w.lower() in stop_words]
