Click here to Skip to main content
15,849,487 members
Articles / General Programming / Algorithms

C++ Implementations of Quicksort Methods for Sorting Arrays with Lots of Duplicates

Rate me:
Please Sign up or sign in to vote.
4.83/5 (32 votes)
31 Aug 2015CPOL16 min read 51.3K   440   31   25
Efficient Quicksort methods for sorting arrays with lots of duplicate elements

Introduction

This article presents implementation of the Quicksort methods that are most suitable for sorting arrays with a lot of duplicate elements. Recursive implementations, as well as those using a stack and multithreading are provided. Benchmarks are included for 4-core and 2-core processors. Comparison with other sorting methods (Timsort, Heapsort and Mergesort) is given. The code attached has been tested using VC++ 2015 and GCC C++ 5.1.

Background

Since Tony Hoare wrote his article "Quicksort" [1-3], quite a few various Quicksort partitioning algorithms have been proposed. Some of them are not well suited for sorting arrays with duplicates. For example, with a lot of duplicates the Lomuto partitioning [4] can easily run out of stack space. In this article, I will look at the most appropriate, most efficient algorithms for sorting arrays with duplicates.

The task of sorting arrays with lots of duplicates may seem to be artificial and theoretical, but such tasks can be found in the real world: for example, sorting the words from a text. I will be presenting this example. In practice, we may find many problems like that when we use non-unique keys for sorting data: telephone calls (the phone number is the key), car service (the car make is the key), visitors to a museum (the place the visitors are coming from is the key).

Various Partitioning Schemes

I have selected the following three partitioning schemes: Hoare Partitioning, Three-Way Partitioning and Dual-Pivot Partitioning.  During the preparation of this article, I investigated other methods (Lomuto Partitioning, Dijkstra Three-Way Partitioning [5] and their modifications). But they require more comparisons than the selected methods.

Hoare Partitioning

This partitioning remains one of the most efficient when it comes to general sorting. Here is a recursive implementation in C++:

typedef std::ptrdiff_t Index;

template <class Iterator, class Comparison = std::less<>>
void HoareQuickSort(Iterator a, Iterator finish, Comparison compare = Comparison())
{
    Index right = (finish - a) - 1;
    Index left = 0;

    if (right <= left)
        return;

    auto pivot = a[left + (right - left) / 2];

    Index i = left;
    Index j = right;

    while (i <= j)
    {
        while (compare(a[i], pivot))
            ++i;

        while (compare(pivot, a[j]))
            --j;

        if (i <= j)
        {
            std::swap(a[i++], a[j--]);
        }
    }

    HoareQuickSort(a, a + (j + 1), compare);
    HoareQuickSort(a + i, finish, compare);
}

I will use the Index type throughout the article. You may wish to define is as int, but do not use unsigned or std::size_t, as there are methods where negative values are needed.

Comparison can be submitted as a function or a comparison class. If the comparison object is not supplied, the array will be sorted in ascending order. Here are some examples:

bool gt(double x, double y)
{
    return x > y;
}

struct GT
{
    bool operator()(double x, double y) const
    {
        return x > y;
    }
};

int main()
{
    std::vector<double> a{ -5, 2, 3, 4.5 };
    HoareQuickSort(std::begin(a), std::end(a));
    for (auto&& e : a)
    {
        std::cout << e << " "; // -5 2 3 4.5
    }
    std::cout << std::endl;         

    std::vector<double> x{ -5, 2, 3, 4.5 };
    HoareQuickSort(std::begin(x), std::end(x), gt);
   
    for (auto&& e : x)
    {
        std::cout << e << " "; // 4.5 3 2 -5
    }
    std::cout << std::endl;

    std::vector<double> y{ -15, 20, 3, 4.5 };
    HoareQuickSort(std::begin(y), std::end(y), GT());   

    for (auto&& e : y)
    {
        std::cout << e << " "; // 20 4.5 3 -15
    }
    std::cout << std::endl;
}

Three-Way Partitioning

The partitioning scheme that I am going to discuss here is a slight modification of Bentley-McIlroy partitioning [6, 7], which was modified by Robert Sedgewick [8-10]. The method proposed here has fewer swaps than the original. This target of this partitioning scheme is to produce three partitions:

  1. the elements, which are "less" than the pivot value (strictly-speaking where the compare object returns true);
  2. the elements that are equal to the pivot value (!compare(a[i], pivot) && !compare(pivot,a[i]) is true);
  3. the elements, which are "greater" than the pivot (where the compare function returns false).

But there are various ways of achieving that. In Bentley-McIlroy algorithm, the biginning is similar to Hoare partitioning: two indices move simultaneously from both ends of the array. But, meanwhile, extra comparisons is done and extra partitions are created from the elements that are equal to the pivot value. At first four partitions are created:

  1. some of the elements that are equal to the pivot;
  2. the element that are "less" than the pivot;
  3. the elements that are "greater" than the pivot;
  4. the rest of the elements that are equal to the pivot.

Then partitions 1 and 2 are moved between partitions 2 and 3. You may think that there are losts of swaps and there may be a faster way achieving Three-Way Partitioning. But it appears to be the fastest way. For example, Dijkstra Three-Way Partitioning [5] looks simpler, but requires more comparisons.

Here is a recursive Three-Way Quicksort implementation in C++:

template <class Iterator, class Comparison = std::less<>>
void ThreeWayQuickSort(Iterator a, Iterator finish, Comparison compare = Comparison())
{
    Index right = (finish - a) - 1;
    Index left = 0;

    if (right <= 0)
        return;

    Index idx1 = left + (right - left) / 2;   

    auto pivot = a[idx1];

    Index i = left;
    Index j = right;

    Index index1 = left - 1;
    Index index2 = right + 1;

    while (i <= j)
    {
        while (compare(a[i], pivot))
            ++i;

        while (compare(pivot, a[j]))
            --j;

        if (i < j)
        {
            if (compare(pivot, a[i]))
            {
                if (compare(a[j], pivot))
                {
                    std::swap(a[i], a[j]); // a[i] != pivot and a[j] != pivot
                }
                else
                {
                    std::swap(a[i], a[j]);  // a[i] != pivot and a[j] == pivot
                    std::swap(a[++index1], a[i]);
                }
            }
            else if (compare(a[j], pivot)) // a[i] == pivot a[j] != pivot
            {
                std::swap(a[i], a[j]);
                std::swap(a[j], a[--index2]);
            }
            else // a[i] == pivot and a[j] == pivot
            {
                std::swap(a[++index1], a[i]);
                std::swap(a[j], a[--index2]);
            }

            ++i;
            --j;
        }
        else if (i == j)
        {
            j = i - 1;
            i++;
            break;
        }
    }
   
    for (Index k = left; k <= index1; k++)
    {
        std::swap(a[k], a[j--]);       
    }

    for (Index k = right; k >= index2; k--)
    {
        std::swap(a[i++], a[k]);       
    };

    ThreeWayQuickSort(a, a + (j + 1), compare);
    ThreeWayQuickSort(a + i, finish, compare);
}

Dual-Pivot Partitioning

This scheme is similar to the method proposed by Vladimir Yaroslavskiy [11].  I have slighly simplified and modified it.  The basic idea is that two pivots are selected and three partitions are created. Here is the main part of modification: there are to approaches depending on whether the pivots are the same or not. If the pivots are different and we order them so that compare(pivot1, pivot2) is true (we say, pivot1 is "less" than pivot2), then the following partitions are created:

  1. the elements that are "less" than or equal to pivot1 (!compare(pivot1,a[i]) is true);
  2. the elements that are between pivot1 and pivot2 (compare(pivot1,a[i]) && compare(a[i],pivot2) is true);
  3. the elements that are "greater" than or equal to pivot2 (!compare(a[i],pivot2) is true).

If the selected pivots are the same, the partitions are slightly different:

  1. the elements that are "less" than pivot1 (compare(a[i],pivot1) is true);
  2. the elements that are equal to pivot1 (and obviously equal to pivot2) (!compare(pivot1,a[i]) && !compare(a[i],pivot1) is true);
  3. the elements that are "greater" than pivot1 (compare(pivot1,a[i]) is true).

If the pivots are equal, obviously the second partition does not have to be sorted.

Here is a recursive Dual-Pivot C++ implementation:

template <class Iterator, class Comparison = std::less<>>
void DualPivotQuickSort(Iterator  a, Iterator finish, Comparison compare = Comparison())
{
    Index right = (finish - a) - 1;
    Index left = 0;

    if (right <= left)
    {
        return;
    }

    auto idx1 = left + (right - left) / 3;
    auto idx2 = right - (right - left) / 3;

    std::swap(a[idx1], a[left]);
    std::swap(a[idx2], a[right]);

    if (compare(a[right], a[left]))
    {
        std::swap(a[left], a[right]);
    }

    if (right - left == 1)
    {
        return;
    }

    auto pivot1 = a[left];
    auto pivot2 = a[right];

    if (compare(pivot1, pivot2))
    {
        Index less = left + 1;
        Index greater = right - 1;

        while (!compare(a[greater], pivot2))
        {
            greater--;
        }

        while (!compare(pivot1, a[less]))
        {
            less++;
        }

        Index k = less;
        while (k <= greater)
        {
            if (!compare(pivot1, a[k]))
            {
                std::swap(a[k], a[less++]);
            }
            else if (!compare(a[k], pivot2))
            {
                std::swap(a[k], a[greater--]);
                while (!compare(a[greater], pivot2))
                {
                    greater--;
                }

                if (!compare(pivot1, a[k]))
                {
                    std::swap(a[k], a[less++]);
                }
            }
            k++;
        }

        std::swap(a[less - 1], a[left]);
        std::swap(a[greater + 1], a[right]);

        DualPivotQuickSort(a, a + less - 1, compare);
        DualPivotQuickSort(a + greater + 2, finish, compare);
        DualPivotQuickSort(a + less, a + greater + 1, compare);
    }
    else
    {
        // pivot1 == pivot2
        Index less = left + 1;
        Index greater = right - 1;

        while (compare(pivot1, a[greater]))
        {
            greater--;
        }

        while (compare(a[less], pivot1))
        {
            less++;
        }
       
        Index k = less;
        while (k <= greater)
        {
            if (compare(a[k], pivot1))
            {
                std::swap(a[k], a[less++]);
            }
            else if (compare(pivot1, a[k]))
            {
                std::swap(a[k], a[greater--]);
                while (compare(pivot1, a[greater]))
                {
                    greater--;
                }

                if (compare(a[k], pivot1))
                {
                    std::swap(a[k], a[less++]);
                }
            }
            k++;
        }

        std::swap(a[less - 1], a[left]);
        std::swap(a[greater + 1], a[right]);
       
        DualPivotQuickSort(a, a + less - 1, compare);
        DualPivotQuickSort(a + greater + 2, finish, compare);
    }
}

First Indication of Efficiency: Counting Comparisons and Swaps

In this section I will look at the numbers of comparions and swaps for the selected partitioning methods. It will give a rough indication of the speed of the methods.

I have tested the Quicksort methods on vectors of 10 million random-generated elements with various numbers of distinct elements. The produced number of comparisons is shown in Figure 1. This may be a good indication of speed when the comparisons take much more time than the swaps, as in case of std::vector<std::string>.

Image 1

Figure 1.  Number of Comparisons

As we can see, Hoare Quicksort is less dependent on the number of distinct elements and is outperformed by the other two methods when there are a lot of duplicates. Three-Way Quicksort is quite good when there are many duplicates, but starts losing its efficiency when the number of duplicates is smaller.

In Figure 2, the numbers of swaps is shown.

Image 2

Figure 2.  Number of Swaps

Three-Way Quicksort has the fewest number of swaps, except in cases when the number of distinct elements is close to 100%.

In Figure 3, I have shown the sum of comparisons and swaps. This may be of good indication of the speed when the comparisons and swaps take approximately the same amount of time, as in case of, for example, std::vector<double>.

Image 3

Figure 3. Number of Comparisons plus Swaps. Good indication of speed for std::vector<double>

Implementations using a Stack

In order to use a stack, Hoare Partitioning itself can be implemented as follows:

template <class Elem, class Comparison>
struct HoarePartition
{
    ParamList operator()(Elem *a, Index left, Index right, Comparison compare)
    {
        Index i = left;
        Index j = right;
        Elem pivot = a[left+(right-left)/2];

        while (i <= j)
        {
            while (compare(a[i], pivot))
                ++i;

            while (compare(pivot, a[j]))
                --j;

            if (i <= j)
            {
                std::swap(a[i], a[j]);
                ++i;
                --j;
            }
        }

        return { {left,j}, {i, right} };
    }
};

For the other partitioning schemes, the implementations will be similar. The ranges of values are pushed into a ParamList object, where they are ordered in the descending order of range sizes.  When the values will be moved from the ParamList object onto the stack, the largest range will be pushed first, which means that, according the LIFO access rule, it will be considered last. This will allow to minimize the size of the stack.

The definition of the related classes is as follows:

struct QuickSortParams
{
    Index m_left;
    Index m_right;
    QuickSortParams():m_left(0),m_right(0) {}
    QuickSortParams(Index left, Index right) :m_left(left), m_right(right) {}

    void setParams(Index left, Index right)
    {
        m_left = left;
        m_right = right;
    }

    void getParams(Index& left, Index& right)
    {
        left = m_left;
        right = m_right;
    }
};

bool CompareParams(const QuickSortParams& p1, const QuickSortParams& p2)
{
    Index d1 = p1.m_right - p1.m_left;
    Index d2 = p2.m_right - p2.m_left;

    if (d1 > d2)
        return true;
    return false;
}

template<class Elem, class Comparison = std::less<Elem>>
void SmallSort(Elem* a, Index left, Index right, Comparison compare = Comparison())
{
    for (Index i = left; i < right; i++)
    {
        for (Index j = i; j >= left && compare(a[j + 1], a[j]); --j)
        {
            std::swap(a[j + 1], a[j]);
        }
    }
};

struct ParamList
{
    std::vector<QuickSortParams> m_vector;
    ParamList() {}

    ParamList(std::initializer_list<QuickSortParams> list) : m_vector(list)
    {
        SmallSort(m_vector.data(), 0, m_vector.size() - 1, CompareParams);
        for (Index i = 0; i < m_vector.size(); i++)
        {
            if (m_vector[i].m_right - m_vector[i].m_left <= 0)
            {
                m_vector.erase(std::begin(m_vector) + i, std::end(m_vector));
                break;
            }
        }
    }

    std::size_t size() const
    {
        return m_vector.size();
    }
   
    QuickSortParams& operator[](std::size_t i)
    {
        return m_vector[i];
    }

    std::vector<QuickSortParams>::iterator begin()
    {
        return m_vector.begin();
    }

    std::vector<QuickSortParams>::iterator end()
    {
        return m_vector.end();
    }
};

Here is the stack implementation:

template <template<class, class> class Method, class Iterator, class Comparison>
void QuickSortRun(Iterator start, Iterator finish, Comparison compare)
{
    Index right = (finish - start) - 1;
    Index left = 0;

    if (right <= 0)
        return;

    typedef typename std::remove_reference<decltype(*start)>::type Elem;
    Elem* a = &*start;

    const Index stackDepth = 64;
    QuickSortParams stack[stackDepth];
    Index stackIndex = -1;

    Method<Elem, Comparison> method;

    while (true)
    {

        if (right - left <= 32)
        {
            SmallSort(a, left, right, compare);
            if (stackIndex != -1)
            {
                stack[stackIndex--].getParams(left, right);
                continue;
            }
            break;
        }

        Index dist = (right - left);
        Index i0 = left;
        Index i1 = left + dist / 3;                    
        Index i2 = left + dist / 2;               
        Index i3 = right - dist / 3;
        Index i4 = right;

        SmallSort(a, { i0,i1,i2,i3,i4 }, compare);       

        ParamList params = method(a, left, right, compare);

        for (auto& p : params)
        {           
            stack[++stackIndex].setParams(p.m_left, p.m_right);        
        }

        if (stackIndex != -1)
        {
            stack[stackIndex--].getParams(left, right);
        }
        else
        {
            break;
        }
    }
}

First of all, if the distance between the first and the last index is less than 32, SmallSort is used, which is based on Insertionsort. Then five elements are selected and ordered using the following, different SmallSort implementation:

template<class Elem, class Comparison = std::less<Elem>>
void SmallSort(Elem* a, std::vector<Index> idx, Comparison compare = Comparison())
{
    Index left = 0;
    Index right = idx.size() - 1;
    for (Index i = left; i < right; i++)
    {
        for (Index j = i; j >= left && compare(a[idx[j + 1]], a[idx[j]]); --j)
        {
            std::swap(a[idx[j + 1]], a[idx[j]]);
        }
    }
}

Then the partitioning method is called, which is provided as a template parameter. The method returns a collection of ranges, which are pushed onto the stack. Then ranges are taken off the stack (if available) and the loop continues until the stack is empty.

Finally, the stack implementation HoareQuickSort will be as follows:

template <class Iterator, class Comparison = std::less<>>
void HoareQuicksort(Iterator start, Iterator finish, Comparison compare = Comparison())
{
   QuickSortRun<HoarePartition>(start, finish,compare); 
}

Multithreading Implementations

The approach is similar to the stack implementation. But instead of pushing values onto the stack, we create threads:

const unsigned SortThreadCount = 7;

template <template<class, class> class Method, class Iterator, class Comparison>
void ParQuickSortRun(Iterator start, Iterator finish, Comparison compare)
{
    Index right = (finish - start) - 1;
    Index left = 0;

    if (right <= 0)
        return;

    typedef typename std::remove_reference<decltype(*start)>::type Elem;
    Elem* a = &*start;

    std::vector<std::thread> threads;

    if ((right-left) < 100000)
    {
        QuickSortRun<Method, Elem*, Comparison>(a + left, a + right + 1, compare);
        return;
    }

    Index range = (right - left) / SortThreadCount;
    
    Method<Elem, Comparison> method;

    while (true)
    {

        if (right - left < range)
        {
            threads.push_back(std::thread(QuickSortRun<Method, Elem*, Comparison>, a + left, a + right + 1, compare));
            break;

        }

        Index dist = (right - left);       
        Index i0 = left;
        Index i1 = left + dist / 3;       
        Index i2 = left + dist / 2;       
        Index i3 = right - dist / 3;       
        Index i4 = right;

        SmallSort(a, { i0,i1,i2,i3,i4 }, compare);

        ParamList params = method(a, left, right, compare);

        bool first = true;

        Index n = params.size();
        for (Index i = n - 1; i >= 0; i--)
        {
            QuickSortParams& p = params[i];
            if (first)
            {
                first = false;
                left = p.m_left;
                right = p.m_right;
            }
            else
            {
                threads.push_back(std::thread(ParQuickSortRun<Method, Iterator, Comparison>, start + p.m_left, start + p.m_right+1, compare));
            }                       
        }

        if (first)
            break;
    }

    for (unsigned i = 0; i < threads.size(); i++)
    {
        threads[i].join();
    }
}

There is the SortThreadCount constant, which I have set to 7. It seems to work well for both 2-core and 4-core processors. But, obviously, for processors with more cores this constant value should be increased. If the distance between the first and the last index is less than 100,000 we don't use multithreading. Then, in the same way as for the stack case,  we order the selected 5 elements and call the method. When the list of ranges is returned the last range is used in the current thread, the rest are used for new threads.

The loop continues until either there are no ranges left of the range size is less than the size of the whole array divided by SortThreadCount. In the latter case the non-multithreading method will be used for the last time. After the loop has finished, the ParQuickSortRun function must wait until all the threads have terminated.

Benchmarking

Random-Generated Data

The following tests have been carried out to produce benchmarks:

  • using stack and multithreaded implementations for the three QuickSort algorithms, as well as using std::sort;
  • all the tests have been compiled both in VC++2015 and GCC C++ 5.1 for an i7-4790, 4 cores, 3.6 GHz  with 8 GB memory;
  • all the tests have been compiled for GCC C++ 5.1 for an i5-2410, 2 cores, 2.3 GHz, with 6 GB memory;
  • the three types of arrays have been considered: std::vector<string> (with strings over 71 characters long -- slow on comparison, but fast on swap), std::vector<double> (fast on comparison and swap), std::vector<Block>(where Block is a class, containing a 31-charater array -- slow on both comparison and swap).

  Tests for std::vector<Block>

The Block class  and the comparison function were defined as follows:

struct Block
{
    static constexpr std::size_t N = 31;
    unsigned char x[N];
    Block(unsigned v = 0)
    {
        for (int i = 0; i < N; i++)
        {
            x[i] = (v >> i) & 1;
        }
    };
};

bool operator<(const Block& a, const Block& b)
{
    for (int i = 0; i < Block::N; i++)
    {
        if (a.x[i] < b.x[i])
            return true;
        if (a.x[i] > b.x[i])
            return false;
    }
    return false;
}

The tests for VC++ on  i7-4790 are shown in Figure 4, for GCC C++ on  i7-4790 in Figure 5, for GCC C++ on i5-2410 in Figure 6.

Image 4

Figure 4.

Image 5

Figure 5.

Image 6

Figure 6.

I use the word "Par" to name multithreaded implementations. The results show that on average the multithreaded implemetations performed over 3.5 times as fast as their non-multithreaded versions on the 4-core i7 processor, and more than twice as fast on the 2-core i5.  Dual-Pivot Quicksort was faster than Hoare QuickSort in all cases. Three-Way Quicksort was faster than Dual-Pivot only in some cases when there were very few distinct elements.

The std::sort on VC++ seems to be similar in peformance to Three-Way Quicksort; and on GCC C++ it is similar to Hoare QuickSort.

Tests for std::vector<std::string>

The sorting of std::vector<std::string> tests for VC++ on  i7-4790 are shown in Figure 7, for GCC C++ on  i7-4790 in Figure 8, for GCC C++ on i5-2410 in Figure 9.

Image 7

Figure 7.

Image 8

Figure 8.

Image 9

Figure 9.

Here the results are a bit similar to blocks. Dual-Pivot Quicksort was faster than Hoare Quicksort (almost the same when the number of distinct elements was close to the size of the array). Dual-Pivot Quicksort was slightly less efficient than Three-Way Quicksort in some cases when the numbers distict element were less than 1000.

The std::sort performed not as good the corresponding  Three-Way on VC++ and Hoare on GCC C++.

Tests for std::vector<double>

The sorting of <font face="Courier New">std::vector<double></font> tests for VC++ on  i7-4790 are shown in Figure 10, for GCC C++ on  i7-4790 in Figure 11, for GCC C++ on i5-2410 in Figure 12.

Image 10

Figure 10.

Image 11

Figure 11.

Image 12

Figure 12.

Here Dual-Pivot Quicksort beats Hoare Quicksort. Dual-Pivot was slightly slower than Three-Way Quicksort in the mid-range of duplicates (distinct elements in the range between 100 and 100,000).  GCC C++ implementation of std::sort was faster than Hoare Quicksort and was faster than single-threaded Dual-Pivot Quicksort. The VC++ std::sort ran slower than single-threaded Three-Way QuickSort.

Comparison with Other Methods

Here I would like to show some comparison with other methods: Timsort [12-15], Heapsort [8] (std::sort_heap) and Mergesort [8].  I will use the following implementation of Mergesort:

template<class Elem, class Comparison>
void merge(Elem *v, Elem* a, Index left, Index right, Index rightEnd, Comparison compare)
{
    Index temp = left, leftEnd = right - 1;
    Index start = left;

    for (Index i = start; i <= rightEnd; i++)
    {
        a[i] = std::move(v[i]);
    }

    while ((left <= leftEnd) && (right <= rightEnd))
    {
        if (compare(a[left], a[right]))
        {
            v[temp++] = std::move(a[left++]);
        }
        else
        {
            v[temp++] = std::move(a[right++]);
        }
    }
    while (left <= leftEnd)
    {
        v[temp++] = std::move(a[left++]);
    }

    while (right <= rightEnd)
    {
        v[temp++] = std::move(a[right++]);
    }
}

template<class Elem, class Comparison>
void mSort(Elem *a, Elem* v, Index left, Index right, Index limit, Comparison compare)
{
    if (left<right)
    {

        if (right - left < 32)
        {
            SmallSort(a, left, right, compare);
            return;
        }

        bool sorted = true;
        for (Index i = left; i < right; i++)
        {
            if (compare(a[i + 1], a[i]))
            {
                sorted = false;
                break;
            }
        }

        if (sorted)
            return;

        // checking for reverse-sorted
        bool reverse_sorted = true;
        for (Index i = left; i < right; i++)
        {
            if (compare(a[i], a[i + 1]))
            {
                reverse_sorted = false;
                break;
            }
        }

        if (reverse_sorted)
        {
            Index i = left;
            Index j = right;
            while (i < j)
            {
                std::swap(a[i++], a[j--]);
            }
            return;
        };

        Index center = (left + right) / 2;
        std::thread th1;
        if (right - left > limit)
        {
            th1 = std::thread(mSort<Elem, Comparison>, a, v, left, center, limit, compare);
            mSort(a, v, center + 1, right, limit, compare);
            th1.join();
        }
        else
        {
            mSort(a, v, left, center, limit, compare);
            mSort(a, v, center + 1, right, limit, compare);
        }
        merge(a, v, left, center + 1, right, compare);
    }
}

template<class Iterator, class Comparison = std::less<>>
void MergeSort(Iterator start, Iterator finish, Comparison compare = Comparison())
{
    Index right = (finish - start) - 1;

    if (right <= 0)
        return;

    typedef typename std::remove_reference<decltype(*start)>::type Elem;
    Elem* a = &*start;

    Elem *temp = new Elem[right + 1];
    Index limit = (right + 1) / (SortThreadCount*2);
    mSort(a, temp, 0, right, limit, compare);
    auto heap2 = MemoryHeapUsed();
    delete[] temp;
}

The key features are as follows:

  1. The standard move assigment is used for merging, which is efficient for movable objects, such as strings, containers.
  2. Before sorting each fragment is checked whether it is sorted or reverse-sorted: it makes sorting more efficient for oscillating data;
  3. Multithreading is used.

For Timsort, I used the implementation, provided in [14]. I have slighly modified it in the attached zip-file: it has a simpler comparison implementation, which allows it to run slightly faster then the original. I have also added the ability to measure the heap size used by the algorithm. The data for sorting random-generated strings is given in Figure 13.

Image 13

Figure 13.

Timsort does not have any avantage on random-generated strings. In addition, it used between 479MB and 532MB of extra heap space in these tests. I will discuss Timsort and other methods more in the next section when I will cover the benchmarks for partially-sorted data. Mergesort is not the most efficient, but is performing quite well.

Partially-Sorted Data

In practice, it is often the case that data is correlated or even partially-sorted.  I show the tests for partially sorted data of two types:

  • type 1 - oscillating data (basically a sine curve) shown in Figure 14;
  • type 2 - oscillating data climbing a slope (similar to sin(x)+x)  shown in Figure 15.

 

 

Image 14

Figure 14.

Image 15

Figure 15.

I have specially tested Timsort on arrays if Big blocks of data, each block containing 1000 bytes. Timsort has an advantage in some tests over other sorting methods as you may see in Figure 16.

Image 16

Figure 16.  Testing Arrays with Partially-Sorted Data Type 1. Each element contains 1000 bytes.

You may see here that Timsort does not perform that well when there are lots of duplicates, but when the number of duplicates decreases its performance dramatically improves. Actually, you may see that Mergesort is very close to Timsort in performance. The reason is that Timsort utilizes the Mergesort algorithm. In these tests the Par Dual-Pivot  and Par Three-Way Quicksort methods show the best peformance when there are lots of duplicates. But Timsort is the best when it comes to few duplicates.

Let's us now look and Partially-Sorted Data Type 2. The tests are shown in Figure 17. The "changing slopes" in brackets means how many times the slope changes its direction in the graph: from "ascending" to "descending" or vice versa. If this value is zero, it means that the array is sorted or sorted in the reverse order. In this graph, it means that the vector is sorted in  ascending order.

Image 17

Figure 17.  Testing Arrays with Partially-Sorted Data Type 2.  In the case of 1,000,000 distinct elements, the data is sorted.

The Timsort performance the best here. It outperforms all the other methods.  But this is the case with very few duplicates.  When all the elements are sorted Timsort has the best performance as well: 0.54 sec. But here the data elements are big and they take some time to swap.

I have shown that Timsort performs well for vectors of blocks with data Type 1 (oscillating data) with few duplicates. In contrast, when we deal with arrays of strings, where swap operations involve just swapping pointers, Timsort, although performs sometimes better than non-parallel Quicksort methods, does not outperform multithreading  Quicksort methods -- Par Hoare, Par Dual-Pivot and Par Three-Way. The results for strings are shown in Figure 18.

Image 18

Figure 18. Testing Arrays with Partially-Sorted Data Type 1. Each element is of type std::string, which make it easy to swap them.

Here the multithreading Three-Way works slightly better when it comes to sorting lots of duplicates.  But Par Dual-Pivot and Par Hoare are better when the number of duplicates decreases. Similar results you may find for the vectors of double floating-point values: the parallel Quicksort methods are the fastest.

A few words about the comparison between Timsort and Quicksort methods:

  1. Heap space: in most tests Timsort allocates extra k*sizeof(elem)*N bytes on the heap (where 0.125 <=k <= 0.5 and N is the size of the array). For example, for 1,000,000 blocks of 1000 bytes this methods used between 121MB and 487MB of the heap. In contrast, Quicksort needs about 1024 bytes for a stack, and does not use any extra heap space.
  2. Timsort is good at sorting big blocks of partially-sorted data, while Quicksort is better at sorting arrays containing containers, simple types and pointers, where the data may be totally random.  In practice, Timsort is good for sorting database data.
  3. Quicksort can be easily parallelized: it is more difficult to do it for Timsort. There is no parallel implementation for it yet. My feeling is that when it comes to parallel implementation it probably will need even more heap space. In principle, it is always possible to split array into parts and merge them, but extra heap is needed [16].  Quicksort can be easily parallelized and take advantage of the multicore processors.
  4. For arrays with lots of duplicates, Dual-Pivot Quicksort and Three-Way Quicksort are the fastest methods among the discussed so far. 

A Real-world Problem:  Extracting Words from a Text and Presenting Them in Alphabetical Order

Here is a practical problem: extracting all the distinct words from a text and sorting them in ascending order. It does not matter  how we store the final result (an ordered set or a vector). The results are shown in Figure 19.

The exact problem is this. A vector of all the 1.5 million words from a text, already converted into uppercase is given. The task is to convert it to a collection of distinct words presented in alphabetical order. In this case, the resulting collection will contain 24,355 words. I used a text  combined from several books by Alexandre Dumas [17]. I tried texts in other European languages: the results were similar.

Image 19

Figure 19.  Extracting distinct words from a text and sorting them

For example, for Hoare Quicksort I measured the execution time of the following code:

HoareQuicksort(std::begin(words), std::end(words));
auto it = std::unique(words.begin(), words.end());
words.erase(it, words.end());

For std::set the following time was measured:

std::set<std::string> st;
for (auto& x : words)
{
   st.insert(std::move(x));
}

Contrary to what some of the readers may expect, using  std::set is  not the fastest way. Timsort and Heapsort are the slowest. And multithreading Dual-Pivot and Three-Way Quicksort are the fastest.

Conclusion

Dual-Pivot QuickSort is faster than Hoare in all cases. Three-Way Quicksort may be sligtly faster than Dual-Pivot in some cases with a lot of duplicates, but when the number of duplicates is small (number distinct elements is over 95% of the size of the array) Dual-Pivot is the winner. The average speed increase of multithreaded implementation is roughly propotionate to the number of cores.  Since we live in a multi-core processor world, it would be good if multithreaded sorting algorithms were provided in the C++ Standard Library.

Using the Code

All the attached code should be compiled with optimization. For GCC C++, it is recommended to use the following options:

g++ -m64 -Ofast -march=native -funroll-loops -ffast-math -std=c++14 %1.cpp -o %1.exe

For VC++, I would suggest the following options:Image 20

In addition, you may consider using the following option:

Enable Enhanced Instruction Set Advanced Vector Extensions 2 (/arch:AVX2)

Major Revisions

  1. 31 August 2015. First Version.
  2. 06 September 2015. Comparison with Timsort, Mergesort and Heapsort has been added;  benchmarks for partially-sorted data.
  3. 13 September 2015.  The standard Heapsort  (a combination of std::make_heap and std::sort_heap) is used in these examples. A different more efficient, implementation of Mergesort is provided. A real-word example is added: extracting distinct words from a text.

References

1. Hoare, C.A.R. (1961). Algorithm 63, Partition: Algorithm 64, Quicksort: Communications of the ACM, Vol. 4,

p.321.

2. Hoare, C.A.R. (1962). QuickSort. The Computer Journal, Vol. 5  (1):  pp. 10-16.

3. http://comjnl.oxfordjournals.org/content/5/1/10.full.pdf

4. http://www.mif.vu.lt/~valdas/ALGORITMAI/LITERATURA/Cormen/Cormen.pdf

5. http://fpl.cs.depaul.edu/jriely/ds2/extras/demos/23DemoPartitioningDijkstra.pdf

6. Bentley, JON L., McIlroy, M. Douglas (1993). Engineering a Sort Function. SOFTWARE—PRACTICE AND EXPERIENCE, Vol.

23(11), pp. 1249–1265.

7. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.14.8162&rep=rep1&type=pdf

8. Sedgewick, Robert; Wayne, Kevin (2011). Algorithms (4th ed.). Addison-Wesley Professional.

9. http://algs4.cs.princeton.edu/lectures/23Quicksort.pdf

10. http://algs4.cs.princeton.edu/20sorting/

11. http://iaroslavski.narod.ru/quicksort/DualPivotQuicksort.pdf

12. http://www.drmaciver.com/2010/01/understanding-timsort-1adaptive-mergesort/

13. http://www.envisage-project.eu/proving-android-java-and-python-sorting-algorithm-is-broken-and-how-to-fix-it/

14. https://github.com/gfx/cpp-TimSort

15. http://www.gamasutra.com/view/news/172542/Indepth_Smoothsort_vs_Timsort.php

16. http://www.javacodegeeks.com/2013/04/arrays-sort-versus-arrays-parallelsort.html

17. http://www.gutenberg.org/files/1259/1259-h/1259-h.htm

 

 

 

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior)
United Kingdom United Kingdom
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
GeneralMy vote of 5 Pin
Arthur V. Ratz23-Nov-16 20:38
professionalArthur V. Ratz23-Nov-16 20:38 
QuestionBad benchmark and bad code Pin
Byron Goodman23-Oct-16 15:22
Byron Goodman23-Oct-16 15:22 
First, I want to say, this implementation of quicksort is exactly how you don't want to do it. The dual pivot version of the quicksort implementation will perform as fast with 2 threads as this intrusive stack version without synchronization.

On the dual pivot implementation you can add 5 calls and add a queue and it will perform better, with out the taxing thread thrash the multi-process stack version gives you. Not to mention on the stack version, with 7 threads, their all fighting.

C++
threads.push(std::thread(dualpivot_quicksort<Iterator, Comparison>, a, a + less - 1, compare));
threads.push(std::thread(dualpivot_quicksort<Iterator, Comparison>, a + greater + 2, finish, compare));
threads.push(std::thread(dualpivot_quicksort<Iterator, Comparison>, a + less, a + greater + 1, compare));


I've implemented it below, and if battery life and other trade offs are important, use this one.

C++
template <class Iterator, class Comparison = std::less<>>
void dualpivot_quicksort_mp(Iterator a, Iterator finish, Comparison compare = Comparison()) {

  std::ptrdiff_t first = 0;
  std::ptrdiff_t last = (finish - a) - 1;

  if (last <= first) return;    

  std::swap(a[(first + (last - first) / 3)], a[first]);
  std::swap(a[(last - (last - first) / 3)], a[last]);

  if (compare(a[last], a[first])) std::swap(a[first], a[last]);

  if (last - first == 1) return;

  auto pivot1 = a[first];
  auto pivot2 = a[last];

  std::queue<std::thread> threads;

  if (compare(pivot1, pivot2)) {
    std::ptrdiff_t less = first + 1;
    std::ptrdiff_t greater = last - 1;

    while (!compare(a[greater], pivot2)) greater--;
    while (!compare(pivot1, a[less])) less++;

    std::ptrdiff_t k = less;
    while (k <= greater) {
      if (!compare(pivot1, a[k])) {
        std::swap(a[k], a[less++]);
      } else if (!compare(a[k], pivot2)) {
        std::swap(a[k], a[greater--]);
        while (!compare(a[greater], pivot2)) { greater--; }
        if (!compare(pivot1, a[k])) std::swap(a[k], a[less++]);
      }
      k++;
    }
    std::swap(a[less - 1], a[first]);
    std::swap(a[greater + 1], a[last]);
    
    threads.push(std::thread(dualpivot_quicksort<Iterator, Comparison>, a, a + less - 1, compare));
    threads.push(std::thread(dualpivot_quicksort<Iterator, Comparison>, a + greater + 2, finish, compare));
    threads.push(std::thread(dualpivot_quicksort<Iterator, Comparison>, a + less, a + greater + 1, compare));
  } else {
    std::ptrdiff_t less = first + 1;
    std::ptrdiff_t greater = last - 1;

    while (compare(pivot1, a[greater])) greater--;    
    while (compare(a[less], pivot1)) less++;
       
    std::ptrdiff_t k = less;
    while (k <= greater) {
      if (compare(a[k], pivot1)) {
        std::swap(a[k], a[less++]);
      } else if (compare(pivot1, a[k])) {
        std::swap(a[k], a[greater--]);
        while (compare(pivot1, a[greater])) greater--;
        if (compare(a[k], pivot1)) {
          std::swap(a[k], a[less++]);
        }
      }
      k++;
    }
    std::swap(a[less - 1], a[first]);
    std::swap(a[greater + 1], a[last]);       
    threads.push(std::thread(dualpivot_quicksort<Iterator, Comparison>, a, a + less - 1, compare));
    threads.push(std::thread(dualpivot_quicksort<Iterator, Comparison>, a + greater + 2, finish, compare));
  }

  while (!threads.empty()) {
    threads.front().join();
    threads.pop();
  }    
}



Below my benchmarks for comparison:

Running iterations 8 for test insertsort() for time 0
Running iterations 8 for test quicksort() for time 0
Running iterations 8 for test dualpivot_quicksort() for time 0
Running iterations 8 for test stack_based_quicksort() for time 0
Running iterations 8 for test dualpivot_quicksort_mp() for time 0.0156268
Running iterations 8 for test std::sort() for time 0
----------------------------------------------------------------------------------------
Running iterations 16 for test insertsort() for time 0
Running iterations 16 for test quicksort() for time 0
Running iterations 16 for test dualpivot_quicksort() for time 0
Running iterations 16 for test stack_based_quicksort() for time 0
Running iterations 16 for test dualpivot_quicksort_mp() for time 0.0312484
Running iterations 16 for test std::sort() for time 0
----------------------------------------------------------------------------------------
Running iterations 32 for test insertsort() for time 0
Running iterations 32 for test quicksort() for time 0
Running iterations 32 for test dualpivot_quicksort() for time 0
Running iterations 32 for test stack_based_quicksort() for time 0
Running iterations 32 for test dualpivot_quicksort_mp() for time 0.0156254
Running iterations 32 for test std::sort() for time 0
----------------------------------------------------------------------------------------
Running iterations 64 for test insertsort() for time 0
Running iterations 64 for test quicksort() for time 0
Running iterations 64 for test dualpivot_quicksort() for time 0
Running iterations 64 for test stack_based_quicksort() for time 0
Running iterations 64 for test dualpivot_quicksort_mp() for time 0.0211347
Running iterations 64 for test std::sort() for time 0
----------------------------------------------------------------------------------------
Running iterations 128 for test insertsort() for time 0
Running iterations 128 for test quicksort() for time 0.0004998
Running iterations 128 for test dualpivot_quicksort() for time 0
Running iterations 128 for test stack_based_quicksort() for time 0
Running iterations 128 for test dualpivot_quicksort_mp() for time 0.015979
Running iterations 128 for test std::sort() for time 0
----------------------------------------------------------------------------------------
Running iterations 256 for test insertsort() for time 0
Running iterations 256 for test quicksort() for time 0
Running iterations 256 for test dualpivot_quicksort() for time 0
Running iterations 256 for test stack_based_quicksort() for time 0
Running iterations 256 for test dualpivot_quicksort_mp() for time 0.0156254
Running iterations 256 for test std::sort() for time 0
----------------------------------------------------------------------------------------
Running iterations 512 for test insertsort() for time 0
Running iterations 512 for test quicksort() for time 0
Running iterations 512 for test dualpivot_quicksort() for time 0
Running iterations 512 for test stack_based_quicksort() for time 0
Running iterations 512 for test dualpivot_quicksort_mp() for time 0.0156251
Running iterations 512 for test std::sort() for time 0
----------------------------------------------------------------------------------------
Running iterations 1024 for test insertsort() for time 0
Running iterations 1024 for test dualpivot_quicksort() for time 0
Running iterations 1024 for test stack_based_quicksort() for time 0
Running iterations 1024 for test dualpivot_quicksort_mp() for time 0.015625
Running iterations 1024 for test std::sort() for time 0
----------------------------------------------------------------------------------------
Running iterations 2048 for test insertsort() for time 0
Running iterations 2048 for test dualpivot_quicksort() for time 0
Running iterations 2048 for test stack_based_quicksort() for time 0
Running iterations 2048 for test dualpivot_quicksort_mp() for time 0.0156239
Running iterations 2048 for test std::sort() for time 0
----------------------------------------------------------------------------------------
Running iterations 4096 for test insertsort() for time 0.0065287
Running iterations 4096 for test dualpivot_quicksort() for time 0
Running iterations 4096 for test stack_based_quicksort() for time 0
Running iterations 4096 for test dualpivot_quicksort_mp() for time 0.0156257
Running iterations 4096 for test std::sort() for time 0
----------------------------------------------------------------------------------------
Running iterations 8192 for test dualpivot_quicksort() for time 0
Running iterations 8192 for test stack_based_quicksort() for time 0.0156414
Running iterations 8192 for test dualpivot_quicksort_mp() for time 0.0156075
Running iterations 8192 for test std::sort() for time 0
----------------------------------------------------------------------------------------
Running iterations 16384 for test dualpivot_quicksort() for time 0
Running iterations 16384 for test stack_based_quicksort() for time 0.0156263
Running iterations 16384 for test dualpivot_quicksort_mp() for time 0.0156242
Running iterations 16384 for test stack_quicksort_mp(2) for time 0.0216353
Running iterations 16384 for test stack_quicksort_mp(3) for time 0.0157114
Running iterations 16384 for test stack_quicksort_mp(4) for time 0.0312493
Running iterations 16384 for test stack_quicksort_mp(5) for time 0.0156259
Running iterations 16384 for test stack_quicksort_mp(6) for time 0.0372593
Running iterations 16384 for test stack_quicksort_mp(7) for time 0.015637
Running iterations 16384 for test stack_quicksort_mp(8) for time 0.0312508
Running iterations 16384 for test stack_quicksort_mp(9) for time 0.0312492
Running iterations 16384 for test stack_quicksort_mp(10) for time 0.0377994
Running iterations 16384 for test stack_quicksort_mp(11) for time 0.0312501
Running iterations 16384 for test stack_quicksort_mp(12) for time 0.0312495
Running iterations 16384 for test std::sort() for time 0.0156427
----------------------------------------------------------------------------------------
Running iterations 32768 for test dualpivot_quicksort() for time 0.0025033
Running iterations 32768 for test stack_based_quicksort() for time 0.01564
Running iterations 32768 for test dualpivot_quicksort_mp() for time 0.0156104
Running iterations 32768 for test stack_quicksort_mp(2) for time 0.0156242
Running iterations 32768 for test stack_quicksort_mp(3) for time 0.0312504
Running iterations 32768 for test stack_quicksort_mp(4) for time 0.021636
Running iterations 32768 for test stack_quicksort_mp(5) for time 0.0312682
Running iterations 32768 for test stack_quicksort_mp(6) for time 0.0312496
Running iterations 32768 for test stack_quicksort_mp(7) for time 0.0156242
Running iterations 32768 for test stack_quicksort_mp(8) for time 0.0378915
Running iterations 32768 for test stack_quicksort_mp(9) for time 0.0312493
Running iterations 32768 for test stack_quicksort_mp(10) for time 0.0156244
Running iterations 32768 for test stack_quicksort_mp(11) for time 0.037259
Running iterations 32768 for test stack_quicksort_mp(12) for time 0.0468819
Running iterations 32768 for test std::sort() for time 0
----------------------------------------------------------------------------------------
Running iterations 65536 for test dualpivot_quicksort() for time 0.0156245
Running iterations 65536 for test stack_based_quicksort() for time 0.0226327
Running iterations 65536 for test dualpivot_quicksort_mp() for time 0.015626
Running iterations 65536 for test stack_quicksort_mp(2) for time 0.0312498
Running iterations 65536 for test stack_quicksort_mp(3) for time 0.0312508
Running iterations 65536 for test stack_quicksort_mp(4) for time 0.0216336
Running iterations 65536 for test stack_quicksort_mp(5) for time 0.0312706
Running iterations 65536 for test stack_quicksort_mp(6) for time 0.0312501
Running iterations 65536 for test stack_quicksort_mp(7) for time 0.0377584
Running iterations 65536 for test stack_quicksort_mp(8) for time 0.0312926
Running iterations 65536 for test stack_quicksort_mp(9) for time 0.0312495
Running iterations 65536 for test stack_quicksort_mp(10) for time 0.0367593
Running iterations 65536 for test stack_quicksort_mp(11) for time 0.0325875
Running iterations 65536 for test stack_quicksort_mp(12) for time 0.0312498
Running iterations 65536 for test std::sort() for time 0.0221503
----------------------------------------------------------------------------------------
Running iterations 131072 for test dualpivot_quicksort() for time 0.0468752
Running iterations 131072 for test stack_based_quicksort() for time 0.0312718
Running iterations 131072 for test dualpivot_quicksort_mp() for time 0.0372443
Running iterations 131072 for test stack_quicksort_mp(2) for time 0.0312496
Running iterations 131072 for test stack_quicksort_mp(3) for time 0.0312504
Running iterations 131072 for test stack_quicksort_mp(4) for time 0.0378444
Running iterations 131072 for test stack_quicksort_mp(5) for time 0.0312507
Running iterations 131072 for test stack_quicksort_mp(6) for time 0.0312487
Running iterations 131072 for test stack_quicksort_mp(7) for time 0.0377631
Running iterations 131072 for test stack_quicksort_mp(8) for time 0.0468743
Running iterations 131072 for test stack_quicksort_mp(9) for time 0.0372604
Running iterations 131072 for test stack_quicksort_mp(10) for time 0.0314503
Running iterations 131072 for test stack_quicksort_mp(11) for time 0.046874
Running iterations 131072 for test stack_quicksort_mp(12) for time 0.0372663
Running iterations 131072 for test std::sort() for time 0.0468898
----------------------------------------------------------------------------------------
Running iterations 262144 for test dualpivot_quicksort() for time 0.0781411
Running iterations 262144 for test stack_based_quicksort() for time 0.0690516
Running iterations 262144 for test dualpivot_quicksort_mp() for time 0.049367
Running iterations 262144 for test stack_quicksort_mp(2) for time 0.034287
Running iterations 262144 for test stack_quicksort_mp(3) for time 0.0468901
Running iterations 262144 for test stack_quicksort_mp(4) for time 0.0534224
Running iterations 262144 for test stack_quicksort_mp(5) for time 0.0468735
Running iterations 262144 for test stack_quicksort_mp(6) for time 0.0536058
Running iterations 262144 for test stack_quicksort_mp(7) for time 0.0468749
Running iterations 262144 for test stack_quicksort_mp(8) for time 0.0530087
Running iterations 262144 for test stack_quicksort_mp(9) for time 0.0468746
Running iterations 262144 for test stack_quicksort_mp(10) for time 0.0533858
Running iterations 262144 for test stack_quicksort_mp(11) for time 0.0655061
Running iterations 262144 for test stack_quicksort_mp(12) for time 0.0502152
Running iterations 262144 for test std::sort() for time 0.0846942
----------------------------------------------------------------------------------------
Running iterations 524288 for test dualpivot_quicksort() for time 0.147191
Running iterations 524288 for test stack_based_quicksort() for time 0.131547
Running iterations 524288 for test dualpivot_quicksort_mp() for time 0.0846576
Running iterations 524288 for test stack_quicksort_mp(2) for time 0.0845189
Running iterations 524288 for test stack_quicksort_mp(3) for time 0.0686224
Running iterations 524288 for test stack_quicksort_mp(4) for time 0.0845127
Running iterations 524288 for test stack_quicksort_mp(5) for time 0.0841175
Running iterations 524288 for test stack_quicksort_mp(6) for time 0.0781748
Running iterations 524288 for test stack_quicksort_mp(7) for time 0.0841494
Running iterations 524288 for test stack_quicksort_mp(8) for time 0.0847737
Running iterations 524288 for test stack_quicksort_mp(9) for time 0.0843241
Running iterations 524288 for test stack_quicksort_mp(10) for time 0.0842872
Running iterations 524288 for test stack_quicksort_mp(11) for time 0.0846347
Running iterations 524288 for test stack_quicksort_mp(12) for time 0.0781245
----------------------------------------------------------------------------------------
Running iterations 1048576 for test dualpivot_quicksort() for time 0.278731
Running iterations 1048576 for test stack_based_quicksort() for time 0.269396
Running iterations 1048576 for test dualpivot_quicksort_mp() for time 0.169205
Running iterations 1048576 for test stack_quicksort_mp(2) for time 0.147127
Running iterations 1048576 for test stack_quicksort_mp(3) for time 0.15323
Running iterations 1048576 for test stack_quicksort_mp(4) for time 0.146775
Running iterations 1048576 for test stack_quicksort_mp(5) for time 0.153256
Running iterations 1048576 for test stack_quicksort_mp(6) for time 0.146844
Running iterations 1048576 for test stack_quicksort_mp(7) for time 0.152882
Running iterations 1048576 for test stack_quicksort_mp(8) for time 0.163259
Running iterations 1048576 for test stack_quicksort_mp(9) for time 0.137482
Running iterations 1048576 for test stack_quicksort_mp(10) for time 0.162742
Running iterations 1048576 for test stack_quicksort_mp(11) for time 0.153242
Running iterations 1048576 for test stack_quicksort_mp(12) for time 0.146935
----------------------------------------------------------------------------------------
Running iterations 2097152 for test dualpivot_quicksort() for time 0.583025
Running iterations 2097152 for test stack_based_quicksort() for time 0.552213
Running iterations 2097152 for test dualpivot_quicksort_mp() for time 0.303368
Running iterations 2097152 for test stack_quicksort_mp(2) for time 0.282669
Running iterations 2097152 for test stack_quicksort_mp(3) for time 0.285205
Running iterations 2097152 for test stack_quicksort_mp(4) for time 0.28429
Running iterations 2097152 for test stack_quicksort_mp(5) for time 0.284877
Running iterations 2097152 for test stack_quicksort_mp(6) for time 0.284517
Running iterations 2097152 for test stack_quicksort_mp(7) for time 0.300637
Running iterations 2097152 for test stack_quicksort_mp(8) for time 0.284645
Running iterations 2097152 for test stack_quicksort_mp(9) for time 0.298887
Running iterations 2097152 for test stack_quicksort_mp(10) for time 0.279249
Running iterations 2097152 for test stack_quicksort_mp(11) for time 0.284685
Running iterations 2097152 for test stack_quicksort_mp(12) for time 0.285163
----------------------------------------------------------------------------------------
Running iterations 4194304 for test dualpivot_quicksort() for time 1.19118
Running iterations 4194304 for test stack_based_quicksort() for time 1.14091
Running iterations 4194304 for test dualpivot_quicksort_mp() for time 0.622498
Running iterations 4194304 for test stack_quicksort_mp(2) for time 0.586018
Running iterations 4194304 for test stack_quicksort_mp(3) for time 0.601167
Running iterations 4194304 for test stack_quicksort_mp(4) for time 0.601555
Running iterations 4194304 for test stack_quicksort_mp(5) for time 0.60065
Running iterations 4194304 for test stack_quicksort_mp(6) for time 0.58532
Running iterations 4194304 for test stack_quicksort_mp(7) for time 0.585872
Running iterations 4194304 for test stack_quicksort_mp(8) for time 0.601229
Running iterations 4194304 for test stack_quicksort_mp(9) for time 0.600673
Running iterations 4194304 for test stack_quicksort_mp(10) for time 0.58509
Running iterations 4194304 for test stack_quicksort_mp(11) for time 0.579454
Running iterations 4194304 for test stack_quicksort_mp(12) for time 0.60134
----------------------------------------------------------------------------------------
Running iterations 8388608 for test dualpivot_quicksort() for time 2.42266
Running iterations 8388608 for test stack_based_quicksort() for time 2.27543
Running iterations 8388608 for test dualpivot_quicksort_mp() for time 1.38793
Running iterations 8388608 for test stack_quicksort_mp(2) for time 1.2712
Running iterations 8388608 for test stack_quicksort_mp(3) for time 1.30338
Running iterations 8388608 for test stack_quicksort_mp(4) for time 1.28128
Running iterations 8388608 for test stack_quicksort_mp(5) for time 1.28735
Running iterations 8388608 for test stack_quicksort_mp(6) for time 1.28782
Running iterations 8388608 for test stack_quicksort_mp(7) for time 1.28779
Running iterations 8388608 for test stack_quicksort_mp(8) for time 1.27236
Running iterations 8388608 for test stack_quicksort_mp(9) for time 1.28558
Running iterations 8388608 for test stack_quicksort_mp(10) for time 1.24904
Running iterations 8388608 for test stack_quicksort_mp(11) for time 1.25582
Running iterations 8388608 for test stack_quicksort_mp(12) for time 1.24985
----------------------------------------------------------------------------------------
Running iterations 16777216 for test dualpivot_quicksort() for time 4.96115
Running iterations 16777216 for test stack_based_quicksort() for time 4.83202
Running iterations 16777216 for test dualpivot_quicksort_mp() for time 2.7195
Running iterations 16777216 for test stack_quicksort_mp(2) for time 2.63867
Running iterations 16777216 for test stack_quicksort_mp(3) for time 2.60734
Running iterations 16777216 for test stack_quicksort_mp(4) for time 2.57612
Running iterations 16777216 for test stack_quicksort_mp(5) for time 2.59121
Running iterations 16777216 for test stack_quicksort_mp(6) for time 2.57195
Running iterations 16777216 for test stack_quicksort_mp(7) for time 2.60687
Running iterations 16777216 for test stack_quicksort_mp(8) for time 2.58505
Running iterations 16777216 for test stack_quicksort_mp(9) for time 2.57217
Running iterations 16777216 for test stack_quicksort_mp(10) for time 2.55992
Running iterations 16777216 for test stack_quicksort_mp(11) for time 2.53734
Running iterations 16777216 for test stack_quicksort_mp(12) for time 2.54457
----------------------------------------------------------------------------------------
Running iterations 33554432 for test dualpivot_quicksort() for time 10.0238
Running iterations 33554432 for test stack_based_quicksort() for time 10.0771
Running iterations 33554432 for test dualpivot_quicksort_mp() for time 5.47419
Running iterations 33554432 for test stack_quicksort_mp(2) for time 5.41051
Running iterations 33554432 for test stack_quicksort_mp(3) for time 5.29553
Running iterations 33554432 for test stack_quicksort_mp(4) for time 5.25964
Running iterations 33554432 for test stack_quicksort_mp(5) for time 5.18363
Running iterations 33554432 for test stack_quicksort_mp(6) for time 5.19882
Running iterations 33554432 for test stack_quicksort_mp(7) for time 5.28324
Running iterations 33554432 for test stack_quicksort_mp(8) for time 5.31409
Running iterations 33554432 for test stack_quicksort_mp(9) for time 5.18343
Running iterations 33554432 for test stack_quicksort_mp(10) for time 5.18293
Running iterations 33554432 for test stack_quicksort_mp(11) for time 5.16093
Running iterations 33554432 for test stack_quicksort_mp(12) for time 5.19816
----------------------------------------------------------------------------------------
Running iterations 67108864 for test dualpivot_quicksort() for time 20.603
Running iterations 67108864 for test stack_based_quicksort() for time 20.3067
Running iterations 67108864 for test dualpivot_quicksort_mp() for time 10.6964
Running iterations 67108864 for test stack_quicksort_mp(2) for time 10.412
Running iterations 67108864 for test stack_quicksort_mp(3) for time 10.5443
Running iterations 67108864 for test stack_quicksort_mp(4) for time 10.3822
Running iterations 67108864 for test stack_quicksort_mp(5) for time 10.3959
Running iterations 67108864 for test stack_quicksort_mp(6) for time 10.4962
Running iterations 67108864 for test stack_quicksort_mp(7) for time 10.4596
Running iterations 67108864 for test stack_quicksort_mp(8) for time 10.4279
Running iterations 67108864 for test stack_quicksort_mp(9) for time 10.3704
Running iterations 67108864 for test stack_quicksort_mp(10) for time 10.4421
Running iterations 67108864 for test stack_quicksort_mp(11) for time 10.5573
Running iterations 67108864 for test stack_quicksort_mp(12) for time 10.6263
----------------------------------------------------------------------------------------
Running iterations 134217728 for test dualpivot_quicksort() for time 41.9316
Running iterations 134217728 for test stack_based_quicksort() for time 42.1659
Running iterations 134217728 for test dualpivot_quicksort_mp() for time 22.3599
Running iterations 134217728 for test stack_quicksort_mp(2) for time 21.0724
Running iterations 134217728 for test stack_quicksort_mp(3) for time 21.0664
Running iterations 134217728 for test stack_quicksort_mp(4) for time 21.1594
Running iterations 134217728 for test stack_quicksort_mp(5) for time 21.0257
Running iterations 134217728 for test stack_quicksort_mp(6) for time 21.1477
Running iterations 134217728 for test stack_quicksort_mp(7) for time 21.0225
Running iterations 134217728 for test stack_quicksort_mp(8) for time 21.0249
Running iterations 134217728 for test stack_quicksort_mp(9) for time 20.976
Running iterations 134217728 for test stack_quicksort_mp(10) for time 23.8967
Running iterations 134217728 for test stack_quicksort_mp(11) for time 21.0472
Running iterations 134217728 for test stack_quicksort_mp(12) for time 21.2111
----------------------------------------------------------------------------------------
Running iterations 268435456 for test dualpivot_quicksort() for time 85.3401
Running iterations 268435456 for test stack_based_quicksort() for time 83.9478
Running iterations 268435456 for test dualpivot_quicksort_mp() for time 43.1859
Running iterations 268435456 for test stack_quicksort_mp(2) for time 51.3237
Running iterations 268435456 for test stack_quicksort_mp(3) for time 62.8988
Running iterations 268435456 for test stack_quicksort_mp(4) for time 52.3754
Running iterations 268435456 for test stack_quicksort_mp(5) for time 42.0136
Running iterations 268435456 for test stack_quicksort_mp(6) for time 42.2371
Running iterations 268435456 for test stack_quicksort_mp(7) for time 42.0085
Running iterations 268435456 for test stack_quicksort_mp(8) for time 42.1299
Running iterations 268435456 for test stack_quicksort_mp(9) for time 48.7176
Running iterations 268435456 for test stack_quicksort_mp(10) for time 62.2107
Running iterations 268435456 for test stack_quicksort_mp(11) for time 54.8983
Running iterations 268435456 for test stack_quicksort_mp(12) for time 41.9713
----------------------------------------------------------------------------------------
Running iterations 536870912 for test dualpivot_quicksort() for time 208.632
Running iterations 536870912 for test stack_based_quicksort() for time 167.795
Running iterations 536870912 for test dualpivot_quicksort_mp() for time 84.5346
Running iterations 536870912 for test stack_quicksort_mp(2) for time 124.901
Running iterations 536870912 for test stack_quicksort_mp(3) for time 84.6438
Running iterations 536870912 for test stack_quicksort_mp(4) for time 84.2411
Running iterations 536870912 for test stack_quicksort_mp(5) for time 103.182
Running iterations 536870912 for test stack_quicksort_mp(6) for time 92.9349
Running iterations 536870912 for test stack_quicksort_mp(7) for time 86.178
Running iterations 536870912 for test stack_quicksort_mp(8) for time 84.8448
Running iterations 536870912 for test stack_quicksort_mp(9) for time 85.0601
Running iterations 536870912 for test stack_quicksort_mp(10) for time 84.2921
Running iterations 536870912 for test stack_quicksort_mp(11) for time 98.0966
Running iterations 536870912 for test stack_quicksort_mp(12) for time 109.874
----------------------------------------------------------------------------------------
Running iterations 1073741824 for test dualpivot_quicksort() for time 354.323
Running iterations 1073741824 for test stack_based_quicksort() for time 350.627
Running iterations 1073741824 for test dualpivot_quicksort_mp() for time 211.557
Running iterations 1073741824 for test stack_quicksort_mp(2) for time 189.188
Running iterations 1073741824 for test stack_quicksort_mp(3) for time 162.83
Running iterations 1073741824 for test stack_quicksort_mp(4) for time 164.448
Running iterations 1073741824 for test stack_quicksort_mp(5) for time 189.173
Running iterations 1073741824 for test stack_quicksort_mp(6) for time 164.519
Running iterations 1073741824 for test stack_quicksort_mp(7) for time 187.93
Running iterations 1073741824 for test stack_quicksort_mp(8) for time 176.352
Running iterations 1073741824 for test stack_quicksort_mp(9) for time 169.892
Running iterations 1073741824 for test stack_quicksort_mp(10) for time 174.194
Running iterations 1073741824 for test stack_quicksort_mp(11) for time 164.645
Running iterations 1073741824 for test stack_quicksort_mp(12) for time 183.62
----------------------------------------------------------------------------------------
Running iterations 2147483648 for test dualpivot_quicksort() for time 833.86
Running iterations 2147483648 for test stack_based_quicksort() for time 792.975
Running iterations 2147483648 for test dualpivot_quicksort_mp() for time 402.89
Running iterations 2147483648 for test stack_quicksort_mp(2) for time 369.152
Running iterations 2147483648 for test stack_quicksort_mp(3) for time 434.237
Running iterations 2147483648 for test stack_quicksort_mp(4) for time 350.387
Running iterations 2147483648 for test stack_quicksort_mp(5) for time 335.642
Running iterations 2147483648 for test stack_quicksort_mp(6) for time 332.848
Running iterations 2147483648 for test stack_quicksort_mp(7) for time 356.625
Running iterations 2147483648 for test stack_quicksort_mp(8) for time 481.358
Running iterations 2147483648 for test stack_quicksort_mp(9) for time 485.626
Running iterations 2147483648 for test stack_quicksort_mp(10) for time 496.28
Running iterations 2147483648 for test stack_quicksort_mp(11) for time 502.038
Running iterations 2147483648 for test stack_quicksort_mp(12) for time 505.039
----------------------------------------------------------------------------------------


This article is exactly why there should be pier reviews before something like this is posted.
AnswerRe: Bad benchmark and bad code Pin
Mikhail Semenov19-Nov-17 4:15
Mikhail Semenov19-Nov-17 4:15 
GeneralRe: Bad benchmark and bad code Pin
Byron Goodman19-Nov-17 7:04
Byron Goodman19-Nov-17 7:04 
GeneralRe: Bad benchmark and bad code Pin
Mikhail Semenov19-Nov-17 7:58
Mikhail Semenov19-Nov-17 7:58 
GeneralRe: Bad benchmark and bad code Pin
Byron Goodman19-Nov-17 8:13
Byron Goodman19-Nov-17 8:13 
GeneralRe: Bad benchmark and bad code Pin
Mikhail Semenov19-Nov-17 10:02
Mikhail Semenov19-Nov-17 10:02 
GeneralRe: Bad benchmark and bad code Pin
Byron Goodman19-Nov-17 10:55
Byron Goodman19-Nov-17 10:55 
GeneralRe: Bad benchmark and bad code Pin
Mikhail Semenov19-Nov-17 11:17
Mikhail Semenov19-Nov-17 11:17 
GeneralRe: Bad benchmark and bad code Pin
Byron Goodman19-Nov-17 12:55
Byron Goodman19-Nov-17 12:55 
QuestionMy vote of 5 Pin
simviews3-Dec-15 18:56
simviews3-Dec-15 18:56 
AnswerRe: My vote of 5 Pin
Mikhail Semenov3-Dec-15 23:40
Mikhail Semenov3-Dec-15 23:40 
GeneralMy vote of 5 Pin
Farhad Reza2-Oct-15 19:50
Farhad Reza2-Oct-15 19:50 
GeneralRe: My vote of 5 Pin
Mikhail Semenov2-Oct-15 22:39
Mikhail Semenov2-Oct-15 22:39 
GeneralMy vote of 5 Pin
Ștefan-Mihai MOGA16-Sep-15 2:22
professionalȘtefan-Mihai MOGA16-Sep-15 2:22 
GeneralRe: My vote of 5 Pin
Mikhail Semenov16-Sep-15 3:02
Mikhail Semenov16-Sep-15 3:02 
QuestionNovel Variations of QuickSort Pin
Member 105695742-Sep-15 20:14
Member 105695742-Sep-15 20:14 
AnswerRe: Novel Variations of QuickSort Pin
Mikhail Semenov3-Sep-15 3:26
Mikhail Semenov3-Sep-15 3:26 
GeneralRe: Novel Variations of QuickSort Pin
Ammar Muqaddas27-Oct-16 2:44
Ammar Muqaddas27-Oct-16 2:44 
GeneralRe: Novel Variations of QuickSort Pin
Mikhail Semenov27-Oct-16 4:13
Mikhail Semenov27-Oct-16 4:13 
GeneralRe: Novel Variations of QuickSort Pin
Ammar Muqaddas27-Oct-16 9:08
Ammar Muqaddas27-Oct-16 9:08 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.