I have a variable foo, which points to a string, "bar"
foo = "bar"
I have a list, called whitelist.
If whitelist is not empty, the elements contained are a whitelist.
If whitelist is empty, then the if statement permits any string.
I have implemented this as follows
whitelist = ["bar", "baz", "x", "y"]
if whitelist and foo in whitelist:
print("bar is whitelisted")
# do something with whitelisted element
if whitelist, by my understanding, checks if whitelist returns True. whitelist will be False if whitelist is empty. If whitelist contains elements, it will return True.
However, the real implementation of this contains:
- lots of strings to check e.g. `"bar", "baz", "x", "y", "a", "b"
- lots of whitelists to check against
Therefore, I was wondering if there is a more computationally efficient way of writing the if statement. It seems like checking the existence of whitelist each time is inefficient, and could be simplified.
CodePudding user response:
These are some ways to check whether an element is in a list or not.
from timeit import timeit
import numpy as np
whitelist1 = {"bar", "baz", "x", "y"}
whitelist2 = np.array(["bar", "baz", "x", "y"])
def func1():
return {"foo"}.intersection(whitelist1)
def func2():
return "foo" in whitelist1
def func3():
return np.isin('foo',whitelist1)
def func4():
return whitelist2[np.searchsorted(whitelist2, 'foo')] == 'foo'
print("func1=",timeit(func1,number=100000))
print("func2=",timeit(func2,number=100000))
print("func3",timeit(func3,number=100000))
print("func4=",timeit(func4,number=100000))
Time Taken by each function
func1= 0.01365450001321733
func2= 0.005112499929964542
func3 0.5342871999600902
func4= 0.17057700001168996
FOr randomly generated list
from timeit import timeit
import numpy as np
import random as rn
from string import ascii_letters
# randomLst = for a in range(500) rn.choices(ascii_letters,k=5)
randomLst = []
while len(randomLst) !=1000:
radomWord = ''.join(rn.choices(ascii_letters,k=5))
if radomWord not in randomLst:
randomLst.append(radomWord)
whitelist1 = {*randomLst}
whitelist2 = np.array(randomLst)
randomWord = rn.choice(randomLst)
randomWords = set(rn.choices(randomLst, k=100))
def func1():
return {randomWord}.intersection(whitelist1)
def func2():
return randomWord in whitelist1
def func3():
return np.isin('foo',whitelist1)
def func4():
return whitelist2[np.searchsorted(whitelist2, randomWord)] == randomWord
def func5():
return randomWords & whitelist1
print("func1=",timeit(func1,number=100000))
print("func2=",timeit(func2,number=100000))
print("func3",timeit(func3,number=100000))
print("func4=",timeit(func4,number=100000))
print("func5=",timeit(func5,number=1000)) # Here I change the number to 1000 because we check the 100 randoms word at one so number = 100000/100 = 1000.
Time taken
func1= 0.012835499946959317
func2= 0.005004600039683282
func3 0.5219665999757126
func4= 0.19900090002920479
func5= 0.0019264000002294779
Conclusion
If you want to check only one word then 'in' statement is fast
But, if you have a list of word then '&' statement is fast 'func5'
Note: function 5 returns a set with the words that are in the whitelist
CodePudding user response:
whitelist would exist, but if it's possible None coerce with:
whitelist = whitelist or []
As shared above then you can just foo in whitelist to figure out if it's in the list. This is O(len(whitelist)) operation. Arrays are surprisingly fast (say, for at least len(whitelist) >= 1,000) in practice.
If you need it to be faster use a set, and optionally if you need to do n lookup collect your foos into a set then use intersect for O(n):
foos = { 'bar', 'none' }
whitelist = { 'bar' }
for foo in foos & whitelist:
print(foo)
CodePudding user response:
Here is the simplified solution, You can do that with two methods
whitelist = ["bar", "baz", "x", "y"]
foo = "bar"
# method 1
def WhiteListExists(foo, whitelist):
if whitelist and foo in whitelist:
return True
else:
return False
exists = WhiteListExists(foo,whitelist)
# method 2
exists = True if whitelist and foo in whitelist else False
Both methods do the same but the second one is fast.
