In this hands-on tutorial, we will explore concurrent programming in C++ through the implementation of a concurrent quick sort and a lock-based hash table.

Building Blocks

Before diving into the detailed implementation, let’s first go through the building blocks of concurrent programming in C++.

std::thread

std::thread is a class that represents a single thread of execution. It can be used to create new threads that run concurrently with the calling thread.

1
2
3
4
5
6
7
#include <iostream>
#include <thread>

void thread_ex1() {
    std::thread t([] { std::cout << "Hello, World!" << std::endl; });
    t.join();
}

A thread can be created by passing a callable object (e.g., a lambda function) to the std::thread constructor. The thread should be either join()ed to wait for the thread to finish its execution or detach()ed from the calling thread.

Here is another example of using std::thread to call a member function of a class:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
#include <cassert>
#include <iostream>
#include <thread>

struct A {
    int x = 2;
    void f(int y) {
        x += y;
    }
};

void thread_ex2() {
    A a;
    std::thread t(&A::f, &a, 3);
    t.join();
    assert(a.x == 5); // 2 + 3
}

std::future

std::future is a class representing the result that will be available in the future. It can be created using std::async, which abstracts away the thread management, making it simpler to run a function asynchronously and get the result later.

1
2
3
4
5
6
7
#include <cassert>
#include <future>

void future_ex1() {
    std::future<int> fut = std::async([] { return 25; });
    assert(25 == fut.get());
}

There are two other ways to create a std::future object: std::promise and std::packaged_task. Briefly, std::promise is used to separate the producer and consumer of the value, and std::packaged_task is for decoupling function execution and future retrieval.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#include <cassert>
#include <future>
#include <thread>

void producer(std::promise<int>&& prom) {
    using namespace std::chrono_literals;
    std::this_thread::sleep_for(1s);
    prom.set_value(6);
}

void future_ex2() {
    std::promise<int> prom;
    std::future fut = prom.get_future();
    std::thread t(producer, std::move(prom));
    assert(6 == fut.get());
    t.join();
}

void future_ex3() {
    std::packaged_task<int(int)> task([](int x) { return x * x; });
    std::future fut = task.get_future();

    std::thread t(std::move(task), 5);
    assert(25 == fut.get());
    t.join();
}

std::mutex

std::mutex is a synchronization primitive that can be used to protect shared data from being simultaneously accessed by multiple threads.

1
2
3
4
5
6
7
8
#include <mutex>

void mutex_ex1() {
    std::mutex mut;
    mut.lock();
    // access shared resource
    mut.unlock();
}

std::lock_guard is an RAII mechanism for automatically unlocking a mutex:

1
2
3
4
5
void mutex_ex2() {
    std::mutex mut;
    std::lock_guard lock(mut);
    // access shared resource
}

std::shared_mutex

std::shared_mutex is used to implement readers-writer lock. std::unique_lock and std::shared_lock are used for exclusive access and for shared access, respectively.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
#include <shared_mutex>

void shared_mutex_ex1() {
    std::shared_mutex mut;
    {
        std::unique_lock lock(mut);
        // write access
    }
    {
        std::shared_lock lock(mut);
        // read access
    }
}

Quick Sort

Let’s take a look at how to utilize multiple threads in the implementation of quick sort. To begin with, the below is a non-parallel quick sort implementation.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#include <algorithm>
#include <cassert>
#include <vector>

template <typename Iterator>
void quick_sort(Iterator first, Iterator last) {
    if (std::distance(first, last) <= 0) {
        return;
    }

    const auto& pivot = *first;
    const Iterator divide_point = std::partition(
        std::next(first), last,
        [&](const auto& x) { return x < pivot; });

    std::iter_swap(first, std::prev(divide_point));
    quick_sort(first, std::prev(divide_point));
    quick_sort(divide_point, last);
}

void quick_sort_ex1() {
    std::vector<int> v{5, 3, 2, 4, 1};
    quick_sort(v.begin(), v.end());
    assert(std::is_sorted(v.begin(), v.end()));
}

The key to parallelizing quick sort is the fact that it has two tasks: sorting the left and right partitions of the array. We can use std::async to sort the left part in a new thread and sort the right part in the current thread.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#include <algorithm>
#include <cassert>
#include <future>
#include <iostream>
#include <thread>
#include <vector>

template <typename Iterator>
void parallel_quick_sort(Iterator first, Iterator last) {
    std::cout << std::this_thread::get_id() << std::endl;
    if (std::distance(first, last) <= 0) {
        return;
    }

    const auto& pivot = *first;
    const Iterator divide_point = std::partition(
        std::next(first), last,
        [&](const auto& x) { return x < pivot; });
    std::iter_swap(first, std::prev(divide_point));

    std::future fut = std::async(
        parallel_quick_sort<Iterator>, first, std::prev(divide_point));
    parallel_quick_sort(divide_point, last);
    fut.get();
}

How many threads are created by the above code? Well, it depends. std::async can launch a new thread to execute the function immediately or defer the execution until the result is needed (e.g., when get() is called). If the behavior is not specified, the implementation can choose either way depending on factors such as resource availability and optimization strategies.

Hash Table

Let’s delve into the implementation of a lock-based hash table. The obvious way to implement a thread-safe hash table is to protect an unordered_map with a mutex.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#include <cassert>
#include <mutex>
#include <string>
#include <unordered_map>

template <typename K, typename V>
class hash_table {
private:
    std::unordered_map<K, V> data;
    std::mutex mut;

public:
    hash_table() = default;

    void insert(K key, V value) {
        std::lock_guard lk(mut);
        data[key] = value;
    }

    V get(K key) {
        std::lock_guard lk(mut);
        return data[key];
    }

    void remove(K key) {
        std::lock_guard lk(mut);
        data.erase(key);
    }
};

void hash_table_ex1() {
    hash_table<int, std::string> table;
    table.insert(5, std::string{"5"});
    table.insert(9, "9");
    table.remove(9);
    assert("5" == table.get(5));
    assert("" == table.get(9));
    return 0;
}

The above implementation is simple. However, it is not efficient because, after all, only one thread can access the hash table at a time. While a readers-writer lock will surely improve the situation, it is still not optimal.

To maximize the potential of concurrency, we can use more fine-grained locking. Each bucket in the hash table has its own mutex, allowing multiple threads to access different buckets concurrently.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
#include <algorithm>
#include <cassert>
#include <list>
#include <mutex>
#include <shared_mutex>
#include <string>
#include <vector>

template <typename K, typename V, typename Hash = std::hash<K>>
class hash_table {
private:
    class bucket {
    public:
        using value_type = std::pair<K, V>;

        bucket() = default;

        V get(const K& key) {
            std::shared_lock lock(mut);
            const auto it = find(key);
            return it == data.end() ? V() : it->second;
        }

        void insert(const K& key, const V& value) {
            std::unique_lock lock(mut);
            const auto it = find(key);
            if (it == data.end()) {
                data.push_back({key, value});
                return;
            }
            it->second = value;
        }

        void remove(const K& key) {
            std::unique_lock<std::shared_mutex> lock(mut);
            std::erase_if(data, [&](const auto& kv) { return kv.first == key; });
        }

    private:
        std::list<value_type> data;
        std::shared_mutex mut;

        typename decltype(data)::iterator find(const K& key) {
            return std::find_if(data.begin(), data.end(),
                [&](const auto& kv) { return kv.first == key; });
        }
    };

public:
    hash_table(std::size_t size, Hash hasher = Hash())
        : size(size), buckets(size), hasher(hasher) {}

    hash_table(const hash_table& other) = delete;
    hash_table(hash_table&& other) = delete;
    hash_table& operator=(const hash_table& other) = delete;
    hash_table& operator=(hash_table&& other) = delete;

    V get(const K& key) {
        return find(key).get(key);
    }

    void insert(const K& key, const V& value) {
        find(key).insert(key, value);
    }

    void remove(const K& key) {
        find(key).remove(key);
    }

private:
    std::size_t size;
    std::vector<bucket> buckets;
    Hash hasher;

    bucket& find(const K& key) {
        return buckets[hasher(key) % size];
    }
};

Most of the heavy lifting is done by the bucket class, which is responsible for managing the data in each bucket. The hash_table class itself forwards the operations to the appropriate bucket based on the hash of the key.

Conclusion

In this tutorial, we explored the foundational concepts of concurrent programming in C++ and demonstrated practical implementations of a parallel quick sort algorithm and a lock-based hash table. By utilizing concurrency features such as std::thread, std::future, and various mutex types, we can significantly improve the performance and responsiveness of our programs.