Home > Back-end >  How to pick words from a string that matches a criteria
How to pick words from a string that matches a criteria

Time:01-23

I have one string containing many words, I want to store all the words that start with a in a list.

CodePudding user response:

yourlist = [word for word in yourstring.split() if word.startswith("a")]

CodePudding user response:

It'll be easier if first we turn the string into a list of words.

words = s.split()

We'll make a new list called a_words which contains all the words that start with 'a'

a_words = []
for word in words:
    if word[0].lower() == 'a':
        a_words.append(word)

And now we're done, but this can be simplified

words = s.split()
a_words = filter(lambda word: word[0].lower() == 'a', words)

CodePudding user response:

Maybe a little bit more complicated than the other answers, but this works as well.

text = "Happy Ape comes around"
list = []

for i in text.split():
    x = i.lower()
    if x.startswith("a"):
        list.append(i)

print(list)

Output: ['Ape', 'around']

CodePudding user response:

You should have a look to Regular Expressions (RE) module.

The documentation is here

The method re.findall(...) will find all the expressions matching the regex and return a list of them.

import re

string = "Ah, this is an absolutely amazing example"
re.findall('(?:\s|^)((?=a|A)\w*)', string)
# -> ["Ah", "an", "absolutely", "amazing"]

Here you select any word starting with 'a' or 'A'. Explanation :

  • (?:\s|^) :
    • (?: non selective group
    • \s Any space or tab char
    • | & ^ or start of string
  • ((?=a|A)\w*)
    • (?=a|A) Non selective group for 'a' or 'A'
    • \w* any word character at least zero times
    • ( & ) selective group, that will be the returned string.

--EDIT--
After a quick benchmark test, the RegEx method seems actually slower than the Filter method proposed by @Botahamec

def regex_method(s):
     start = timer()
     re.findall('(?:\s|^)((?=a|A)\w*)', s)
     end = timer()
     return( timedelta(seconds=end-start) )

def filter_method(s):
     start = timer()
     words = s.split()
     a_words = filter(lambda word: word[0].lower() == 'a', words)
     end = timer()
     return( timedelta(seconds=end-start) )

len(s)
# -> 27000000

print(regex_method(s))
# -> 0:00:01.231664
print(filter_method(s))
# -> 0:00:00.759951
  •  Tags:  
  • Related