Home > Net >  Sorting arrays using OpenMP: why do some randomized arrays end up with long numbers and not sorted c
Sorting arrays using OpenMP: why do some randomized arrays end up with long numbers and not sorted c

Time:01-13

Our assignment involves creating a sorting algorithm that sorts an array of randomly generated integers. The array size if set as an argument when executing the program.

For testing, we print the first 10 elements of the already sorted array, along with the execution time.

Our implementation works correctly when we don't insert multi-threading in our function that generates the random array. However, when using parallel code, in about 10% of cases we get an unexpected result. For 12000000 size array as an example:

1st execution output: 0 1 2 2 4 7 7 9 9 9

2nd execution output: 0 1 1 1 1 2 4 4 7 7

3rd execution output: 0 1 1 1 1 2 2 2 4 4

4th execution output: 0 1 1 2 4 7 7 9 12 16

5th execution output: 0 10278907 1671508 1716191 145377 3825599 1265238 859463 6112391 11065992

nth execution output: more expected results and the occasional unexpected result.

At first, I thought the problem was the rand() function we used that wasn't thread safe. So I changed our function from this:

void randomizeArray(int* array, int size, int max_value) {
int i;
#pragma omp parallel for
for (i = 0; i < size; i  ) {
    array[i] = rand() % max_value;
}
}

To this:

void randomizeArray(int* array, int size, int max_value) {
int i;
unsigned int seed = 1;
#pragma omp parallel for
for (i = 0; i < size; i  ) {
    array[i] = rand_r(&seed) % max_value;
}
}

And the results are the same. A bunch of correctly sorted 1-2 digits output and the occasional not sorted array of large integers. Is this related to the randomize function? Or could it be something else?

Thank you in advance.

CodePudding user response:

You are right that the function rand is not guaranteed to be thread-safe and that rand_r should be used instead.

However, your replacement implementation is also not thread-safe. Although the function rand_r itself is thread-safe, you are writing to the variable seed through the function rand_r using several threads without any thread synchronization, which causes undefined behavior.

Even if you assume that writes to unsigned int are atomic on your platform, so that there are partial writes to the same variable that would cause data corruption, you will still have several threads constantly overwriting seed, which will probably sometimes overwrite it with its previous value, so that the next call to rand_r will generate the same "random" value again. That is probably why your posted output has the same "random" value several times in a row.

For this reason, you need every thread to have its own copy of seed. One way to do this is to change the line

#pragma omp parallel for

to:

#pragma omp parallel for private(seed)

However, this will cause every thread's random number generator to use the same seed value, which will cause every thread's pseudo-random number generator (PRNG) to generate the same sequence of random numbers. Depending on the situation, this could be a problem.

If you don't want every thread to generate the same sequence of random numbers, then you can set the seed of every thread's PRNG depending on the return value of omp_get_thread_num(). That way, every thread should have its own seed and generate a different set of random numbers.

However, you would have to set the PRNG's seed outside the for loop, which means that you would have to split the #pragma omp parallel for clause into a #pragma omp parallel and a #pragma omp for clause:

void randomizeArray(int* array, int size, int max_value)
{
    #pragma omp parallel
    {
        unsigned int seed = omp_get_thread_num();
        #pragma omp for
        for ( int i = 0; i < size; i   )
        {
            array[i] = rand_r(&seed) % max_value;
        }
    }
}
  •  Tags:  
  • Related