I have written the below code to parallelize two 'for' loops.
#include <iostream>
#include <omp.h>
#define SIZE 100
int main()
{
int arr[SIZE];
int sum = 0;
int i, tid, numt, prod;
double t1, t2;
for (i = 0; i < SIZE; i )
arr[i] = 0;
t1 = omp_get_wtime();
#pragma omp parallel private(tid, prod)
{
tid = omp_get_thread_num();
numt = omp_get_num_threads();
std::cout << "Tid: " << tid << " Thread: " << numt << std::endl;
#pragma omp for reduction( : sum)
for (i = 0; i < 50; i ) {
prod = arr[i] 1;
sum = prod;
}
#pragma omp for reduction( : sum)
for (i = 50; i < SIZE; i ) {
prod = arr[i] 1;
sum = prod;
}
}
t2 = omp_get_wtime();
std::cout << "Time taken: " << (t2 - t1) << ", Parallel sum: " << sum << std::endl;
return 0;
}
In this case the execution of 1st 'for' loop is done in parallel by all the threads and the result is accumulated in sum variable. After the execution of the 1st 'for' loop is done, threads start executing the 2nd 'for' loop in parallel and the result is accumulated in sum variable. In this case clearly the execution of the 2nd 'for' loop waits for the execution of the 1st 'for' loop to get over.
I want to do the processing of the two 'for' loop simultaneously over threads. How can I do that? Is there any other way I can write this code more efficiently. Ignore the dummy work that I am doing inside the 'for' loop.
CodePudding user response:
You can declare the loops nowait and move the reduction to the end of the parallel section. Something like this:
# pragma omp parallel private(tid, prod) reduction( : sum)
{
# pragma omp for nowait
for (i = 0; i < 50; i ) {
prod = arr[i] 1;
sum = prod;
}
# pragma omp for nowait
for (i = 50; i < SIZE; i ) {
prod = arr[i] 1;
sum = prod;
}
}
CodePudding user response:
If you use #pragma omp for nowait all threads are assigned to the first loop, the second loop will only start if at least one thread finished in the first loop. Unfortunately, there is no way to tell the omp for construct to use e.g. only half of the threads.
Fortunately, there is a solution to do so (i.e. to run the 2 loops parallel) by using tasks. The following code will use half of the threads to run the first loop, the other half to run the second one using the taskloop construct and num_threads clause to control the threads assigned for a loop. This will do exactly what you intended, but you have to test which solution is faster in your case.
#pragma omp parallel
#pragma omp single
{
int n=omp_get_num_threads();
#pragma omp taskloop num_tasks(n/2)
for (int i = 0; i < 50; i ) {
//do something
}
#pragma omp taskloop num_tasks(n/2)
for (int i = 50; i < SIZE; i ) {
//do something
}
}
