15,851,387 members
Articles / General Programming / Algorithms

# Segmented Map, a Compromise Between Flat Map and std::map

Rate me:
18 Sep 2022CPOL5 min read 5.1K   61   4   8
This article proposes a mapping algorithm, called Segmented Map, which is almost as fast as Flat Map is random access and enumeration of elements and close to std::map in insertion of elements.
This article provides an implementation of a container, called segmented map, which is almost as fast as flat map in random access and enumeration of elements and close to std::map in insertion of elements. The source code is written in C++17 and can be compiled in any C++17 compiler. Benchmarks with the "int-double" and "string-string" key-value pairs are provided for Microsoft VC++ 2022 and GCC 11.2 C++.

## 1. Introduction

A map is a collection of key-value pairs with the following functionality:

• Insertion of a pair
• Search for a pair by a key (each key is unique, there can be no two pairs with the same key)
• Enumeration of the pairs in the order of their keys

There is a container, called` std::map`, in standard C++, which provided all these operations. But for a large number of elements, it does not enumerate them quickly. There have been long discussions and proposals for using flat map, which is a vector of pairs, ordered by key. Binary search is used to retrieve an element. Flat map is faster than `std::map` in enumeration of elements, but slow in insertion because in order to insert an element, some of the other elements in the vector have to be moved.

Segmented map` `is almost as fast in enumeration and search as flat map and close to `std::map` in insertions.

## 2. Segmented Map: Brief Description of Implementation

### 2.1. Overview of the Design

A segmented map is a vector of segments, where a segment is a vector of key-value pairs. Initially, there is only one segment with no pairs -- an empty segmented map. As elements are inserted, the segments will grow. A segment has a limit on its size. When any segment reaches its limit, it is split in half. A segmented map behaves like a multicellular organism, when its cells are dividing: the more it grows, the more segments are created.

The question is: what should be the segment size limit? In my tests, I found that the best value is somewhere between 100 and 300. In the implementation, I have provided the `MaxSliceSize` parameter, which defines the segment size limit.

### 2.2. The SegmentedMap Container

The beginning of the container is as follows:

C++
```template<class K, class V, std::size_t MaxSliceSize = 256,
class Compare = CompareAscending<K>>
class SegmentedMap
{
public:
using difference_type = int;
Compare comp;

using value_type = std::pair<K, V>;

private:

struct Segment
{
Segment() {}
Segment(std::size_t n, const K& first, const K& last) :m_elem(n),
m_first(first),m_last(last) {}
DeVector<value_type> m_elem;
K m_first;
K m_last;
};
...```

The `Segment` structure, which implements a segment, has two key fields `m_first` and `m_last`, which allow to find the range of the keys without analysing the `m_elem` array.

When a new segment is created, the required keys are copied. In this implementation, I did not use pointers to the existing keys. I haven't noticed any significant difference when `m_first` and `m_last `are pointers. Besides, the code is a bit simpler with copies.

The `MaxSliceSize` parameter defines the segment size limit.

I use `DeVector` [1] in the implementation, which is similar to `std::vector`, but provides fast `push_front` and well as `push_back`; and, on average, insertion of elements is twice as fast and in `std::vector`.

### 2.3. Insertion Algorithm

First of all, a segment has to be found by a given key. This is done by using binary search on `m_datax `array`.`

Once the correct segment is found, this is the code that does the insert (`index1` is the index of the segment, `index2` is the index where `elem` (the key-value pair) is going to be inserted):

C++
``` template<class U>
value_type& insert(std::size_t index1, std::size_t index2, U&& elem)
{
DeVector<value_type>& slice =
m_datax[index1].m_elem;//get the slice from segment[index1]
auto p = slice.insert(slice.begin() + index2,
std::forward<U>(elem)); // insert the element into position index2
if (slice.size() >= MaxSliceSize)
{
std::size_t halfSlice = MaxSliceSize / 2;

Segment segment(halfSlice,slice[0].first,
slice[halfSlice-1].first);//create a new segment with the given size
std::move(slice.begin(), slice.begin() + halfSlice, segment.m_elem.begin());
Segment segment2(slice.size() - halfSlice, slice[halfSlice].first,
slice[slice.size() - 1].first);        // create a second segment
std::move(slice.begin() + halfSlice,slice.end(),
segment2.m_elem.begin());              // move the rest of the slice
// into the second segment
std::swap(m_datax[index1], segment2);  // swap segment that contains
// slice with the second segment
m_datax.insert(m_datax.begin() + index1, std::move(segment));// insert
// the first segment into the map array before index1

if (index2 >= halfSlice) // if the new element is in the second segment
{
return m_datax[index1 + 1].m_elem[index2 - halfSlice];
}
else //if the new element is in the first segment
{
return m_datax[index1].m_elem[index2];
}
}
// if a segment is not split, simply update the keys
m_datax[index1].m_first = m_datax[index1].m_elem[0].first;
m_datax[index1].m_last =
m_datax[index1].m_elem[m_datax[index1].m_elem.size() - 1].first;
return *p;
}
```

This is defined as a template to allow to copy or move a pair, which is provided by the universal reference [2] (forwarding reference) on the `elem` parameter.

## 3. Benchmarking

### 3.1. Overview

I have tested the behaviour of `SegmentedMap`, using various keys on two compilers: Microsoft VC++ 2020 and GCC 11.2 C++. Here, I present two types of map key-value pairs:

1. "`int` key"-"`double` value";
2. "`string` key"-"`string` value" (the strings were between 21 and 27 characters long).

### 3.2. System Used

The code was executed on a PC with Intel i7-4790, 3.6GHz processor, 16GB RAM.

When compiled in Visual C++ 2022 (64-bit code), the following options were used:

In GCC, the following command line was:

`g++ -O3 -std=c++17  -m64 SegmentedMapTests.cpp -o SegmentedMapTests.exe`

### 3.3. Insertion Benchmarks

#### 3.3.1. Insertion for "int-double" pairs

The results are shown in Figure 1.

Figure 1. Insertion of an "int-double" pairs.

This clearly indicates that flat map is the slowest. Segmented map twice as fast as std `std::map`. Segmented map is definitely a winner. It took about 1.5 hours to build a `flat map `of 5,000,000 elements, whereas other maps took between 2.5 and 5 seconds.

#### 3.3.2. Insertion for "string-string" pairs

The `string` used in these tests were between 21 and 27 characters long. The results are shown in Figure 2.

Figure 2. Insertion of an "string-string" pairs.

Segmented Map is a bit slower than `std::map.`

### 3.4. Element Enumeration Benchmarks

In sequential access, there is nothing faster than flat map.

#### 3.4.1. Enumeration of "int-double" pairs

The results are shown in Figure 3.

Figure 3. Element enumeration for "int-double" pairs.

Although flat map is the fastest, segmented map is not far behind; `std::map `is` `rather` `slow. But the enumeration of the elements is still quite a fast operation: we are talking nanoseconds here. It still takes less than a second to enumerate all the 5000000 elements even using an `std::map` object.

#### 3.5. Enumeration of "string-string" pairs

The results are shown in Figure 4.

Figure 4. Element enumeration for "string-string" pairs.

For some reason, the results for segmented map are much better in GCC C++ than in VC++; the `std::map` speed is slow.

### 3.5. Random Access Benchmarks

Regarding random access, all the maps behave similar, `flat map `is generally the fastest.

#### 3.5.1. Random Access of "int-double" pairs

Figure 5 shows the results.

Figure 5. Random access for "int-double" pairs.

In` `VC++, segmented map is slower than flat map; in GCC C++, it is not the case. But, overall, segmented map is faster than `std::map`.

#### 3.5.2. Random Access of "string-string" pairs

The results are presented in Figure 6.

Figure 6. Random access for "string-string" pairs.

Segmented map's speed is between flat map and `std::map`.

## 4. Conclusion

The benchmarks show that segmented map behaves very well. It is faster than `std::map` in enumeration of elements (sequential access) and faster than flat map in insertions.

Which one can be recommended? It depends on the size of the data. If you deal with 1000 elements and don't enumerate all the elements often, use `std::map`. If you use no more than 1000 elements and the speed of sequential access is important, use flat map. But if the number of elements approach one million or more, I would recommend segmented map. It's impractical to use flat map: building it will take more than an hour. However, if sequential access is not that important, you may still be happy with `std::map`.

## History

• 18th September, 2022: Initial version

Written By
Software Developer (Senior)
United Kingdom
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

 First Prev Next
 A Few Source Details Questions BernardIE53172-Oct-22 18:23 BernardIE5317 2-Oct-22 18:23
 Thank You for the article. May I please inquire re/ the source i.e. to wit SegmentedMapTests.cpp. On line 107 I see a loop filling the indices vector which is not part of the tests subsequent but which is being timed none the less. Also may I please inquire the meaning of the sequence of values in braces on lines 24 and 36 etc. i.e. { 2.1, 3.7, 4.1, 5.5, 6.6 }; Also may I please inquire the purpose of comparing the vector of retrieved values beginning on line 91. How can the values differ and if they can why not also compare vz. Thank You Kindly PS May I say I am a bit surprised the code was not organized utilizing templates which would result in great simplification
 Re: A Few Source Details Questions Mikhail Semenov6-Oct-22 11:06 Mikhail Semenov 6-Oct-22 11:06
 Cucumbers^2/Furlongs? BernardIE531728-Sep-22 3:09 BernardIE5317 28-Sep-22 3:09
 Re: Cucumbers^2/Furlongs? Mikhail Semenov28-Sep-22 10:20 Mikhail Semenov 28-Sep-22 10:20
 Re: Cucumbers^2/Furlongs? BernardIE531728-Sep-22 10:56 BernardIE5317 28-Sep-22 10:56
 Re: Cucumbers^2/Furlongs? Mikhail Semenov28-Sep-22 22:30 Mikhail Semenov 28-Sep-22 22:30
 very nice article SeattleC++20-Sep-22 5:27 SeattleC++ 20-Sep-22 5:27
 Re: very nice article Mikhail Semenov20-Sep-22 7:13 Mikhail Semenov 20-Sep-22 7:13
 Last Visit: 31-Dec-99 19:00     Last Update: 4-Mar-24 16:02 Refresh 1