Code: DeVectorTests.zip

1. Introduction

Double-ended vector (devector) has been discussed a few times. Some implementations are done by Lars Greger Nordland Hagen [1], Valasiadis Fotios [2] and there is also a Boost implementation [3].

The double-ended vector idea is rather simple: if in a vector the data starts with the beginning of the pre-allocated space, in a double-ended vector the data is positioned relatively in the middle, which means that for a while new items can be pushed at both ends without re-allocating space or moving the elements.

In Figure 1, the std::vector memory reservation is shown.

Figure 1. std::vector memory reservation. The data space always starts with the start of the reserved memory, the size of which is called capacity. As the data grows, the capacity should increase.

If the data needs more space than the capacity, the new space with increased capacity should be allocated. The newly allocated space is usually greater than required (by a factor between 1.5 or 2.0). That allows not to allocate memory too often, as the data increases. The issue with the vector is that it is easy to append data to the end of a vector (push_back operation), but not to the beginning.

If data is positioned roughly in the middle of the reserved space, so that it can grow in both directions, we end up with double-ended vector or devector, which is shown in Figure 2.

Figure 2. Devector memory reservation. The data can grow in both directions, so that push_front as well as push_back can be easily performed.

The headroom is used for push_front and the tail capacity for push_back; if there is not enough room for either of them, usually a new memory slot can be reserved, which can be, for instance, three times the size of the required data, which will be positioned in the middle of the reserved slot.

2. Special Consideration: Balanced Insertion

One issue that should be considered is insertion. When an element is inserted into a vector, there is only one option -- to move the rest of the elements forward, by increasing their addresses in memory. In devector, there are two options:

to move the rest of elements forward;
to move the elements in front backwards.

Which option should be chosen? it depends on the position of the element. If it is closer to the end of the data, then option 1 (forward). If it is closer to the beginning, then option 2 (backwards). If it is in the middle, either option will do. In Figure 3, the basic idea of balanced insertion is illustrated.

Figure 3. Balanced Insertion into a devector.

Is this important? Yes, it is. Imagine if a devector is build using random insertion of elements. By using balanced insertion, we will cut the time by half! If a devector used to create a flat map, which contains "key-value" pairs as elements and binary search is used to find the right element. By using balanced devector, the insertion time into a flat map can be cut by half in comparison with time used by vector.

Unfortunately, in the Boost implementation (Boost devector), the insertion is not balanced and the benchmarks show it. I have included my own implementation, which I call DeVector, where both insertion and deletion are balanced.

3. Benchmarks

3.1. General Observation

At first, I tested devector in Visual C++ 2022 and saw major advantage in using devector. Then I tested in GCC 11.2 with C++17 and the results were not so pronounced. I am presenting results for both compilers.

3.2. System Used

The code was executed on a PC with Intel i7-4790, 3.6GHz processor, 16GB RAM.

When compiled in Visual C++ 2022, the following options were used:

In GCC 11.2, the following command line was used:

g++ -O3 -std=c++17 -I"/boost_1_80_0" -m64 DeVectorTests.cpp -o DeVectorTests.exe

3.3. Random Insertion

3.3.1. Results for Visual C++ 2022

In Figure 4a, the random insertion results are shown.

Figure 4a. Visual C++ 2022. Random insert of double floating-point values (microseconds per element). The data is in logarithmic scale.

It shows that random insertion into a vector or Boost devector is about twice as slow as into a DeVector, which uses balanced insertion. At the same time, insertion into a deque is on average 7 times slower than into a DeVector, which increases to 14 times for 5 million elements. What it means in practice for creating a structure of 5 million elements:

for std::deque, it takes 8 hours;
for std::vector and Boost devector, it takes 1 hour and 5 minutes;
for DeVector, it takes 34 minutes.

For 100,000 elements:

for std::deque, it takes 3 seconds;
for std::vector it takes about 0.7s;
and Boost devector, it takes about 0.85s;
for DeVector, it takes 0.5s.

For random insertion, the advantage double-ended vector over std::deque is obvious.

3.3.2. Results for GCC 11.2 C++

In Figure 4b, the random insertion results are shown.

Figure 4b. GCC 11.2 C++. Random insert of double floating-point values (microseconds per element).

There is some diffrence, but it is not as dramatic as in Visual C++: for the number of elements 5 million and more, deque takes over twice as much time as DeVector.

3.4. Random Access

3.4.1. Results for Visual C++ 2022

The random access results are shown in Figure 5a.

Figure 5a. Visual C++ 2022. Random access of double float-point values (nanoseconds per element).

You can see that there is no much difference in the results of up to 100,000 elements, but than the access to the deque elements slows down in contrast to all other structures.

3.4.2. Results for GCC 11.2 C++

The random access results are shown in Figure 5b.

Figure 5b. GCC 11.2 C++. Random access of double float-point values (nanoseconds per element).

There is not much diffrence. In practice, it takes 38 seconds to access 10⁹ elements in a deque, and it takes 32 seconds to access the same number of elements in a devector.

3.5. Insertion at the Beginning

3.5.1. Results for Visual C++ 2022

In all the structures, except for vector, this opertation is called push_front. As for vector, insert with index 0 is used. The results are shown in Figure 6a.

Figure 6a. Visual C++ 2022. Insertion at the beginning (nanoseconds/element).

For deque, it takes about 27 nanosecond per element; for double-ended vector about 10-12 ns/element; for vector, it is very inefficient. As a result, it takes over 2.5 hours to insert 5 million elements at the front of a vector, as opposed to a fraction of a second for all the other structures.

3.5.2. Results for GCC 11.2 C++

The results for push_front are shown in Figure 6b.

Figure 6b. GCC 11.2 C++. Results for push_front (nanoseconds per element).

There is no major difference. But you can see that deque is faster. But still we are taking about fractions of a second: for pushing to the front 10 million elements it takes 0.05s for a deque and about 0.097s for a devector. One billion elements take 4.4 seconds for deque and about 7.7 for both devectors.

3.6. The push_back Operation

3.6.1. Results for Visual C++ 2022

There are nothing special about push_back: it is a very quick operation and even for one million elements takes a fraction of a second. The results are shown in Figure 7a.

Figure 7a. push_back (nanoseconds/element).

The performance of deque is about 2.5 times slower than that of devectors.

3.6.2. GCC 11.2 C++

The results are shown in Figure 7b.

Figure 7b. The push_back operation (nanoseconds/element).

The results deque is slightly faster. In general the results are similar to push_front.

4. Conclusion

The results clearly demonstrate that deque is slower than double-ended vector, particularly in random construction with one-million elements or more. In random access, deque is slower as well. In Visual C++ deque is particularly inefficient. I was suprised to see how the difference between Visual C++ and GCC implementation: deque implementation is much better in the latter, where push operations are rather efficient.

Perhaps, C++ libraries should provide devector as one of the standard structures, like std::deque, std::vector and std::list. The only advantage of using deque instead of devector may be systems with small memory space, where it is difficult to allocate a lump memory as fragmented allocation could give some advantage.

In practice, for storing 10000 floating-point elements, deque is good enough. With more elements, special attention should be paid to the performance and devector should be preferred, where random insertion and access are more efficient.

References

1. http://larshagencpp.github.io/blog/2016/05/22/devector

2. https://github.com/fvalasiad/devector

3. https://www.boost.org/