Home > database >  How to match a list of strings in a string, returning words match along with its frequency? (in pyth
How to match a list of strings in a string, returning words match along with its frequency? (in pyth

Time:01-30

I have a string.

string = "there is a good recipe for excellent good taste"

I need to match the following list of words to the above string.

words = ['good', 'excellent', 'good taste']

Output expected:

{'good': 1, 'excellent': 1, 'good taste': 1}

Kindly note: 'good' should not be counted twice as second occurrence is 'good taste'. Need solution in Python

CodePudding user response:

Try this one:

words = sorted(words, key=len, reverse = True)
freq = {}
for word in words:
    freq[word] = string.count(word)
    string = string.replace(word, "")
print(freq)

The idea is to store the frequency of the longer word first and then replace it with an empty string.

CodePudding user response:

I suspect what you are after is something like this. Since you prefer good taste to count ahead of good, I assume your order of precedence is by length rather than order given. Here are examples of both though.

def count_by_length(s: str, words: list[str]) -> dict:
    counts = {}
    for w in sorted(words, key=len, reverse=True):
        counts[w] = s.count(w)
        s = s.replace(w, "")
    return counts


def count_by_order(s: str, words: list[str]) -> dict:
    counts = {}
    for w in words:
        counts[w] = s.count(w)
        s = s.replace(w, "")
    return counts


if __name__ == "__main__":
    s = "there is a good recipe for excellent good taste"
    words = ["good", "excellent", "good taste"]
    print(count_by_length(s, words))
    print(count_by_order(s, words))

yielding:

{'good taste': 1, 'excellent': 1, 'good': 1}
{'good': 2, 'excellent': 1, 'good taste': 0}

In both cases, the concept is similar:

  1. Take a string and list of substrings.
  2. Count the instances of a substring (which depends on order).
  3. Remove those instances of the substring.
  4. Add the count to the output dictionary.

In the count_by_length we first order the list by the length of the substrings. The builtin sorted takes an argument key which nicely accepts len for our purposes. Reversing this gets longest first.


A Bug

But each of these still suffers a flaw, that doesn't appear given your example. Which is longer: "a management" or "man in a"? Well, by character count, "a management". But if you're really after number of words in a token, you need to count those. You can do so, using this method, where a lambda is used as the key to count the words.

def count_by_num(s: str, words: list[str]) -> dict:
    counts = {}
    for w in sorted(words, key=lambda x: len(x.split()), reverse=True):
        counts[w] = s.count(w)
        s = s.replace(w, "")
    return counts

if __name__ == "__main__":
    s = "man in a management"
    words = ["a management", "man in a"]
    print(count_by_length(s, words)) # not actually what we meant
    print(count_by_num(s, words))    # the real answer, probably

Which yields:

{'a management': 1, 'man in a': 0}
{'man in a': 1, 'a management': 0}

@ABHISHEK TIBREWAL had a much more terse implementation here.

CodePudding user response:

to create a dictionary you can use this code:

print({word:(string if word != 'good' else string.replace("good taste","")).count(word) for word in words})

CodePudding user response:

string = "there is a good recipe for excellent good taste"
words = ['good', 'excellent', 'good taste']
result_dict = {}

to not count "good" twise we need to search for longer words firstly one by one and remove all the words we've found from the string before start looking for next word

for word in sorted(words, key = len, reverse = True):
    print('the word "{}" appears {} times'.format(word, string.count(word)))
result_dict[word] = string.count(word)
    while string.find(word) >= 0:
        string = string[:string.find(word)]   string[string.find(word)   len(word):]
    print("after removing word we've found the string became :'{}'".format(string))
print (result_dict)
  •  Tags:  
  • Related