Here’s a problem to solve: you have two threads, each incrementing the same single counter N times. What is the state of the counter at the end of the program?
A straightforward solution
A naïve C++ program can be written using the standard library threads like this:
#include <iostream> #include <thread> int counter; void f() { for (int c = 0; c != 100000; c++) counter++; } int main() { std::thread thread_a {f}; std::thread thread_b {f}; thread_a.join(); thread_b.join(); std::cout << counter << std::endl; }
The two threads, thread_a
and thread_b
, are running the same function f
which increments the counter
in a loop. Threads are running concurrently, and we expect that they will update the counter so that it double the number of loop cycles.
At this point, you face a few surprises. The first one is quite expectable: the program is a random generator number, and it prints some number, which is always not less than 100,000 but never reaches 200,000, staying very close to the lower border:
$ ./a.out 101640 $ ./a.out 118064 $ ./a.out 106088 $ ./a.out 103100 $ ./a.out 108400
The second surprise is that this was the result I got on Mac OS using the g++ compiler. The same compiler on Ubuntu gives an executable file that prints exact 200,000:
$ ./a.out 200000 $ ./a.out 200000 $ ./a.out 200000
If you got similar output, try increasing the overall load on your computer, and most likely you get different numbers again:
$ ./a.out 200000 $ ./a.out 127976 $ ./a.out 135940
Notice that you have to compile the code with the -pthread
flag on Linux (p stands for POSIX):
g++ -pthread -std=c++11 test.cpp
What’s going on under Windows?
Using mutexes
But OK, we made a small sidewalk tour, but the main thing is that the original program is not thread-safe yet. We have to make sure that only a single thread updates the counter at any moment.
First, let us try mutexes:
. . . #include <mutex> int counter; std::mutex mu; void f() { for (int c = 0; c != 100000; c++) { mu.lock(); counter++; mu.unlock(); } } . . .
On each iteration, the mutex is locked to ensure the counter is not read and/or updated by another thread. Did it help? Well, depends on which answer you accept.
The program prints the correct number 200,000 but it does it unbelievably slow:
$ time ./a.out 200000 real 0m0.556s
Obviously, working with a mutex on each iteration is an overhead. You can try moving it out of the loop:
void f() { mu.lock(); for (int c = 0; c != 100000; c++) { counter++; } mu.unlock(); }
It is fast and correct now, but the logic is broken. We did not intend to make all 100,000 iterations without interruption. Now, the second thread starts incrementing the counter only after the first thread did its job completely.
The same slow result gives another approach with mutexes—using templates to let them unlock mutexes at the end of the scope:
int counter; std::mutex mu; void f() { for (int c = 0; c != 100000; c++) { std::unique_lock<std::mutex> lock {mu}; counter++; } }
Correct, but slow answer again. We need something more radical.
Atomic operations in Raku
The Raku programming language, which is a recent rename of Perl 6, provides you with atomic operations that you can apply to special atomic integers. I am referring you to the above two links for the details, and here is the corresponding program in Raku:
my atomicint $counter = 0; sub f() { $counter⚛++ for ^100_000; } await do { start {f} start {f} } say $counter;
The atomic behaviour is forced by the two elements: the atomicint
modifier in the variable declaration and the Unicode modifier for the increment operator: ⚛++
. Without those, the result of the program is not predictable: as earlier, it can be any number between 100,000 and 200,000.
As soon as we start working with the atomic variable, the result is correct, and the program is quite quick:
$ time raku test.raku 200000 real 0m0.303s
But wait, how interpreted Raku can be faster than compiled C++? Just to make it more awkward, let us exclude the compilation time and measure the proper part of runtime only:
. . . my $t0 = now; await do { start {f} start {f} } say now - $t0; . . .
It is now about five times faster than a C++ program with mutexes:
$ raku 168.raku 0.13648206
What is wrong here? The next section gives you the answer.
The true C++ atomic
Atomic operations are the answer in C++ too. Modern C++ gives you a very handy way to make a counter atomic.
All you need is to instantiate the atomic
template:
std::atomic<int> counter;
You can also do it using a predefined type synonym.
std::atomic_int counter;
Just for the reference, here is the whole program again. It is almost the same as the first naïve solution, just with an addition of the <atomic>
header and the atomic counter:
#include <iostream> #include <thread> #include <atomic> std::atomic<int> counter; void f() { for (int c = 0; c != 100000; c++) { counter++; } } int main() { std::thread thread_a {f}; std::thread thread_b {f}; thread_a.join(); thread_b.join(); std::cout << counter << std::endl; }
Whatever platform you use to compile and run this program, you get an instant correct result:
$ time ./a.out 200000 real 0m0.008s
To confirm if the generated code is lock-free on your hardware, you can use the following method to check it (it is 1 in my case):
std::cout << counter.is_lock_free() << std::endl;
Hurray! Three goals have been achieved:
- The code is thread-safe
- It is very fast
- It is correct
Congratulations!