I have a list of 200,000 words, a list containing indexes, and a keyword. The index_list is not predefined and can be of any size between 0 to len(keyword).
I wish to iterate through the 200,000 words and only keep the ones that contain the letters in the keyword at the specific index.
Examples:
keyword = "BEANS"
indexList = [0, 3]
I want to keep words that contain 'B" at the 0th index and 'N' and the 3rd index.
keyword = "BEANS"
indexList = [0, 1, 2]
I want to keep words that contain 'B" at the 0th index and 'E' and the 1st index, and 'A' at the 2nd index.
keyword = "BEANS"
indexList = []
No specific words, return all 200,000 words
At the moment,
I have this code. sampleSpace refers to the list of 200,000 words.
extractedList = []
for i in range(len(indexList)):
for word in sampleSpace:
if (word[indexList[i]] == keyword[indexList[i]]):
extractedList.append(word)
However, this code is extracting words that have values at the first index OR values at the second index OR values at the Nth index.
I need words to have ALL of the letters at the specific index.
CodePudding user response:
You can use a simple comprehension with all. Have the comprehension loop over all the words in the big word list, and then use all to check all the indices in indexList:
>>> from wordle_solver import wordle_corpus as corpus
>>> keyword = "BEANS"
>>> indexList = [0, 3]
>>> [word for word in corpus if all(keyword[i] == word[i] for i in indexList)]
['BLAND', 'BRUNT', 'BUNNY', 'BLANK', 'BRINE', 'BLEND', 'BLINK', 'BLUNT', 'BEING', 'BRING', 'BRINY', 'BOUND', 'BLOND', 'BURNT', 'BORNE', 'BRAND', 'BRINK', 'BLIND']
CodePudding user response:
First, change your logic so that your outer loop is for word in sampleSpace. This is because you want to consider each word at once, and look at all the relevant indices in that word.
Next, look up the all() function, which returns true if all of the elements of iterable you gave it are truthy. How can we apply this here? We want to check if
all(
word[index] == keyword[index] for index in indexList
)
So we have:
extractedWords = []
for word in sampleSpace:
if all(word[index] == keyword[index] for index in indexList):
extractedWords.append(word)
Now since this loop is just constructing a list, we can write it as a list comprehension like so:
extractedWords = [word
for word in sampleSpace
if all(word[index] == keyword[index] for index in indexList)
]
You can handle the case of empty indexList separately using an if condition before you do any of this.
def search_keyword_index(sampleSpace, keyword, indexList)
if not indexList:
return sampleSpace # or return sampleSpace[:] if you need to return a copy
return [word for word in sampleSpace if all(word[index] == keyword[index] for index in indexList)]
CodePudding user response:
You can create a set of (index,character) and use it to quickly compare each word in your list:
with open("/usr/share/dict/words") as f:
words = f.read().upper().split('\n') # 235,887 words
keyword = "ELEPHANT"
indexList = [0, 3, 5, 7]
letterSet = {(i,keyword[i]) for i in indexList}
for word in words:
if letterSet.issubset(enumerate(word)):
print(word)
EGGPLANT
ELEPHANT
ELEPHANTA
ELEPHANTIAC
ELEPHANTIASIC
ELEPHANTIASIS
ELEPHANTIC
ELEPHANTICIDE
ELEPHANTIDAE
ELEPHANTINE
ELEPHANTLIKE
ELEPHANTOID
ELEPHANTOIDAL
ELEPHANTOPUS
ELEPHANTOUS
ELEPHANTRY
EPIPLASTRAL
EPIPLASTRON
You could place the result in a list using a comprehension:
eligible = [word for word in words if letterSet.issubset(enumerate(word))]
print(len(eligible)) # 18
