I have a code block that looks like this
std::vector<uint32_t> flags(n, 0);
#pragma omp parallel for
for(int i = 0; i <v; i ) {
// If any thread finds it true, its true.
// Max value of j is n.
for(auto& j : vec[i])
flags[j] = true;
}
Work based upon the flags.
Is there any need for a mutex ? I understand cache coherence is going to make sure all the write buffers are synchronized, and conflicting buffers will not be written to memory. Secondly the overhead of cache coherence can be avoided by simply changing
flags[j] = true;
to
if(!flags[j]) flags[j] = true;
The check if flags[j] is already set will reduce the write frequency thus need for cache coherency updates. And even if by any chance flags[j] is read to be false it will only end up in one extra write to flags[j] which is okay.
EDIT :
Yes multiple threads may and will try to write to the same index in flags[j]. Hence the question.
uint32_t has intentionally been used and bool is not used since writing to a boolean in parallel can malfunction as the neighboring booleans share the same byte. But writing to the same uint32_t in parallel from different threads will not malfunction in the same manner as booleans even without mutex.
FWIW , to comply with the standards, I ended up keeping this code which more or less complies with the standards not 100% though
#pragma omp parallel
{
std::vector<uint32_t> flags_local(n, 0);
#pragma omp parallel for
for(int i = 0; i <v; i ) {
for(auto& j : vec[i])
flags_local[j] = true;
}
// No omp directive here, as all threads
// need to traverse their full arrays.
for(int j = 0; j <n; i ) {
if(flags_local[j] && !flags[j]) {
#pragma omp critical
{ flags[j] = true; }
}
}
}
CodePudding user response:
Thread safety in C is such that you need not worry about cache coherency and such hardware related issues. What matters is what is specified in the C standard, and I don't think it mentions cache coherency. Thats for the implementers to worry about.
For writing to elements of a std::vector the rules are actually rather simple: Writing to distinct elements of a vector is thread-safe. Only if two threads write to the same index you need to synchronize the access (and for that it does not matter whether both threads write the same value or not).
As pointed out by Evg, I made a rough simplification. What counts is that all threads access different memory locations. Hence, with a std::vector<bool> things wouldn't be that simple, because typically several elements of a std::vector<bool> are stored in a single byte.
Yes multiple treads may and will try to write to the same index in flags[j].
Then you need to synrchonize the access. The fact that all elements are false initially and all writes do write true is not relevant. What counts is that you have multiple threads that access the same memory and at least one writes to it. When this is the case you need to synchronize the access or you have a data race.
CodePudding user response:
Accessing a variable concurrently to a write is a race condition, and so it is undefined behavior.
flags[j] = true;
should be protected.
Alternatively you might use atomic types (but see how-to-declare-a-vector-of-atomic-in-c ).
or even simpler using std::atomic_ref (c 20)
std::vector<uint32_t> flags(n, 0);
#pragma omp parallel for
for (int i = 0; i < v; i ) {
for (auto& flag : vec[i]) {
auto atom_flag = std::atomic_ref<std::uint32_t>(flag);
atom_flag = true;
}
}
