I am calling a Flask server(running with 64 gunicorn workers) via http with ThreadPoolExecutor to test execution time.
Both(test script and Flask server) are running on the same host. Inside the Flask endpoint, there is a DB(Postgres) call.
Machine stats (11 cores, 16 GB ram)
Test script:
class StressTest():
def __init__(self, max_workers):
self.__max_workers = max_workers
def __async_test(self, x=range(1000)):
with ThreadPoolExecutor(max_workers=self.__max_workers) as executor:
futures = [
executor.submit(
self.__test,
n=n
)
for n in x
]
for future in futures:
try:
future.result()
except Exception as e:
raise Exception(e)
def __test(self, n):
# Rest call
def run(self):
start_time = datetime.now()
self.__async_test()
duration = datetime.now() - start_time
self.logger.info(f"duration for the process is {duration.total_seconds()} seconds")
if __name__ == "__main__":
StressTest(1).run()
StressTest(2).run()
StressTest(4).run()
.......
Results as follows:
| No of threads | Execution time (s) |
|---|---|
| 1 | 50 |
| 2 | 25 |
| 4 | 12 |
| 8 | 8.5 |
| 16 | 8 |
| 32 | 7.5 |
| 64 | 7.5 |
My question is why execution time is saturating after 8 threads? Isn't it possible to run multiple threads at the same time in a given core?
Any given time, only 11 threads are running (via htop) during the test.
Please let me know if you want additional information about my test. Thank you!
CodePudding user response:
TL/DR: Regardless a programming language used, given core can execute only 1 thread for a given moment of time.
More details: While we say "thread" we can mean CPU-level execution entity (and the answer is above, period), or we can actually talk about so called "tasks", or "async execution", or "futures", whatever. All that terms are about describing a next-level abstraction over a thread via specifying logical steps to get the business be done, not about direct utilization of CPU capabilities. However, actual execution of that abstraction is still performed via CPU thread(s). And that returns us back to the very first disclaimer: given core can execute only 1 thread for a given moment of time.
CodePudding user response:
It (ThreadPoolExecutor) utilizes at most 32 CPU cores for CPU bound tasks which release the GIL
It's that latter condition which should worry you. "GIL" is the Global Interpreter Lock. Your Python runtime has only one such lock, that's why it is called "Global". And it's used for pretty much anything that works with memory. Even a simple a=b can require the GIL. Statements like these are therefore not faster if more cores are available - there simply aren't more locks.
However, I/O operations are designed so that they do not need the GIL while the OS is working on the operation. In this case, multiple cores could help a bit - but here the OS typically is waiting on other hardware, not the CPU.
Postgres can use multiple threads, so you can have multiple threads execute multiple queries. This would likely give a speed-up for that part of the processing. As noted in the comments, Amdahl's Law tells you that a program with mixed multi-threaded and single-threaded parts will be limited by its single-threaded parts if enough cores are available. And in Python, the GIL behaves as that single-threaded part.
CodePudding user response:
I don't know that library well, but in python you can install something which is called thread. There is something about that:
https://www.tutorialspoint.com/python/python_multithreading.htm
