## Table of Contents

## Primary Motivation

My Github repository, Boost Concurrent Permutations and Combinations on CPU, has been up since January 2017. However, download is close to none while my older single threaded `next_combination`

has plenty of downloads. So I figured it must be the code usage is deemed too complex. Aim of this article is to show a very simple, albeit inefficient, multithreaded example to give users a guide. It should be relatively easy for user to come up with an efficient multithreaded code, once they understand the examples in this article. For each permutation and combination section, I am going to start off with a single threaded example then followed by a multi-threaded example.

## Single Threaded Permutation

First, I'll show you the example on using `next_permutation`

in single threaded scenario. The `while`

loop displays `std_permuted`

until `next_permutation`

returned `false`

when `std_permuted`

is detected to be in descending order. That is "`54321`

".

void usage_of_next_perm()
{
std::string std_permuted = "12345";
do
{
std::cout << std_permuted << std::endl;
} while (std::next_permutation(std_permuted.begin(), std_permuted.end()));
}

Output is as follows:

12345
12354
12435
12453
12534
12543
...
54123
54132
54213
54231
54312
54321

## Multi-Threaded Permutation

Next, I'll show you how to find permutation in multithreaded scenario. There are a total of 120 different permutations in collection of 5 elements. We'll use 4 threads in my example, meaning each thread will find 120/4=30 permutations. To tell you the truth, each thread will find 30-1=29 permutations, because the 1^{st} one is already computed by `find_perm_by_idx`

in the main thread and passed to the thread. To keep the discussion simple, we pretend each thread computes 30 permutations. 1^{st} thread finds 0^{th} to 29^{th} permutations, while 2^{nd} thread starts from 30^{th} to 59^{th} permutations, and 3^{rd} thread begins from 60^{th} to 89 and so on. From 2^{nd} thread and 3^{rd} thread to start from 30^{th} and 60^{th} permutation respectively, we need to make use of `find_perm_by_idx`

to find 30^{th} and 60^{th} permutation given their index, we'll find the rest with `next_permutation`

. Technically, it is possible to use `find_perm_by_idx`

to find all permutations by giving indices from 0 to 119 but it is simply not fleasible to do so, because `find_perm_by_idx`

is much much slower than `next_permutation`

. As a word of caution, `index_to_find`

must be of type that is large enough to store the factorial(n). For instance, you are just finding the 1st 10 permutations, `index_to_find`

type can store 0 to 10 but if it cannot store the factorial(n), `find_perm_by_idx`

will not generate the permutation correctly. This applies to `find_comb_by_idx`

as well. If your largest integer type is not large enough, you can consider using Boost Multiprecision Integer. And `vector_type`

can be any collection type that supports `push_back()`

and `size()`

methods, like `std::string`

. You can see I use `std::string`

in my example.

I join my theads before starting the next thread because `all_results`

is shared with all threads and it is not guarded with a lock, therefore not thread-safe. It is only meant to be a simple example on how to `find_perm_by_idx`

. You can see I call the slow `find_perm_by_idx`

in the main thread. It is your job to try to figure out how to call `find_perm_by_idx`

in the worker threads.

template<typename int_type, typename vector_type>
vector_type find_perm_by_idx(int_type index_to_find,
vector_type& original_vector);
void usage_of_perm_by_idx()
{
uint64_t index_to_find = 0;
std::string original_text = "12345";
std::string std_permuted = "12345";
std::vector<std::string> all_results;
size_t repeat_times = 30;
for (size_t i=0; i<4; ++i)
{
std::string permuted = concurrent_perm::find_perm_by_idx(index_to_find, original_text);
std::thread th([permuted, &all_results, repeat_times] () mutable {
all_results.push_back(permuted);
for (size_t j=1; j<repeat_times; ++j)
{
std::next_permutation(permuted.begin(), permuted.end());
all_results.push_back(permuted);
}
});
th.join();
index_to_find += repeat_times;
}
}

I skip the output here, I have verified they are the same as generated by single threaded `next_permutation`

.

## Single Threaded Combination

Next, we have come to `next_combination`

section. To compute the total combination count, we use `compute_total_comb`

. `total`

must be of type that is large enough to store the factorial(n).

void usage_of_next_comb()
{
std::string original_text = "123456";
std::string std_combined = "123";
uint64_t total = 0;
if(concurrent_comb::compute_total_comb(original_text.size(), std_combined.size(), total))
{
std::cout << std_combined << std::endl;
for (uint64_t i = 1; i < total; ++i)
{
stdcomb::next_combination(original_text.begin(), original_text.end(),
std_combined.begin(), std_combined.end());
std::cout << std_combined << std::endl;
}
}
}

Output is as follows. If your collection is sorted like `original_text`

, all generated combination will be sorted as well, as you can see below:

123
124
125
126
134
135
136
145
146
156
234
235
236
245
246
256
345
346
356
456

## Multi-Threaded Combination

`index_to_find`

must be of type that is large enough to store the factorial(n) and `vector_type`

can be any collection type that supports `push_back()`

and `size()`

methods, like `std::string`

. Similar to `find_perm_by_idx`

mentioned above, it is possible to find all combinations with `find_comb_by_idx`

but it is much slower compared to `next_combination`

, so we resort to `next_combination`

to find the rest of combinations. There are total of 20 combinations, so each of the 4 thread will find 20/4=5 combinations. Actually, each thread will find 5-1=4 combination, because the 1^{st} one is already computed by `find_comb_by_idx`

in the main thread and passed to the thread.

template<typename int_type, typename vector_type>
vector_type find_comb_by_idx(const uint32_t subset,
int_type index_to_find,
vector_type& original_vector);
void usage_of_comb_by_idx()
{
uint64_t index_to_find = 0;
std::string original_text = "123456";
std::string std_combined = "123";
std::vector<std::string> all_results;
size_t repeat_times = 5;
for (size_t i = 0; i < 4; ++i)
{
std::string combined = concurrent_comb::find_comb_by_idx(std_combined.size(),
index_to_find, original_text);
std::thread th([original_text, combined, &all_results, repeat_times] () mutable {
all_results.push_back(combined);
for (size_t j = 1; j < repeat_times; ++j)
{
stdcomb::next_combination(original_text.begin(), original_text.end(),
combined.begin(), combined.end());
all_results.push_back(combined);
}
});
th.join();
index_to_find += repeat_times;
}
}

Output is skipped here, afterall I have verified they are the same as generated by single threaded `next_combination`

.