Dissecting the C++ STL Vector: Part 1 – Introduction

Hadi Brais

Rate me:

5.00/5 (2 votes)

14 Jul 2014CPOL8 min read

15.4K

Dissecting the C++ STL vector: Part 1 - introduction

CodeProjectThis is the beginning of a series about the vector container of the C++ Standard Template Library. While there are numerous resources discussing how and when to use this container, this series will be a little bit more advanced. We will get our hands dirty by ruthlessly dissecting the vector container source code. People usually (or never) talk about the source code mainly because there is no need to know anything about it. However, I always like to see what’s happening behind the scenes and so I’ve started this series. It will be hard because the source code is not well-documented and I haven’t seen any articles going at this depth. I will do my best to present the information as accurately as possible. The beginning will be about the programmatic aspects of the vector template and then I will slowly move towards the theory behind it. This part will introduce you to the vector template source code.

I have chosen the vector container because it is very common and most C++ programmers would be familiar with it. Also it has a simple programming interface. However, its implementation and algorithmic aspects are very interesting.

Before we get started, some assumptions must be stated. Firstly, since this an advanced series, I will not get into the basics of the vector container. Also, each compiler has its own standard library associated with it. The interface is the same but there could be many technical differences. The implementation that I will examine is the one associated with the Microsoft Visual C++ 2012/2013 compiler, and it is provided by P.J. Plauger. The version is V6.00:0009. The best way to follow this series is to have any flavor of Visual Studio 2012/2013 (The Express one is free). Alright, let’s get started.

A vector is a sequence of elements that are stored contiguously in memory and can change in size. As a result, it has support for random-access and provides methods to add and delete elements. It is typically used when an array is required, but the exact number of elements is unknown at compile-time. Let’s examine its implementation, which can be found in the vector source file.

The implementation seems overly complicated. This is due to the standard rules which must be respected. Containers are designed to be amazingly flexible and in such a way that they can be adapted to satisfy most different kinds of requirements, even those that rarely occur. It makes the easy case easy and the hard case possible. The vector template is defined in the std namespace as follows:

C++

template  > class vector
  : public _Vector_alloc<!is_empty<_Alloc>::value, _Vec_base_types<_Ty, _Alloc> >;

It has two type parameters, _Ty and _Alloc, the latter has a default type allocator<_Ty>, which in turn is defined in the xmemory0 source file in the std namespace. The first type parameter is the type of the vector’s elements. The second type parameter is responsible for managing the memory being consumed by the elements. The default allocator is sufficient in most cases and therefore the parameter is usually ignored. However, as we will see in future (hypothetical) article, we can solve real problems by writing our own allocator. The base type is _Vector_alloc which is defined in the same source file and is responsible for managing the allocator instance associated with the vector.

Here are the first few lines of the vector template definition:

C++

typedef vector<_Ty, _Alloc> _Myt;
typedef _Vector_alloc<!is_empty<_Alloc>::value, _Vec_base_types<_Ty, _Alloc>> _Mybase;
typedef _Alloc allocator_type;

typedef typename _Mybase::_Alty _Alty;

typedef typename _Mybase::value_type value_type;
typedef typename _Mybase::size_type size_type;
typedef typename _Mybase::difference_type difference_type;
typedef typename _Mybase::pointer pointer;
typedef typename _Mybase::const_pointer const_pointer;
typedef typename _Mybase::reference reference;
typedef typename _Mybase::const_reference const_reference;

#define _VICONT(it) it._Getcont()
#define _VIPTR(it) (it)._Ptr

typedef typename _Mybase::iterator iterator;
typedef typename _Mybase::const_iterator const_iterator;

typedef _STD reverse_iterator reverse_iterator;
typedef _STD reverse_iterator<const_iterator> const_reverse_iterator;

A bunch of type aliases are being defined in addition to two macros. Each of the macros takes a vector iterator as a parameter. The first macro represents calling the _Getcont method on the iterator, which returns a pointer to the associated container, and the second macro accesses the field _Ptr, which is a pointer to the element being pointed at by the iterator. Keen eyes would notice that the it in the second macro is surrounded by braces, but not in the first macro. This is usually done to resolve operator precedence issues that might arise from the operators within the it expression and the following dot operator. However, I have quickly scanned all uses of the macro and I found in all of them, it is just a variable – does not involve any operators. Which means that the braces are not necessary. Maybe at some point in time, they were required.

One might think that all of these typedefs will be helpful in making the syntax clearer. That’s partially true. However, some of them (allocator_type) are not used at all. The main reason for defining all of these is that in order for a type to qualify as an STL container, it must provide some typedefs. The following typedefs are required by the standard: allocator_type, value_type, size_type, difference_type, pointer, const_pointer, reference, const_reference, iterator, const_iterator, reverse_iterator, and const_reverse_iterator. The other typedefs _Myt, _Mybase, and _Alty are defined merely for syntactic simplification. In fact, that’s why they start with underscores: To discourage being used outside the vector template definition.

Here is what each of these typedefs represents:

value_type: The element type in the container
size_type: A type representing the number of elements in the container
difference_type: A type representing the difference between two iterators
pointer/reference: A pointer/reference to the element type stored in the container
const_pointer/const_reference: A pointer/reference to a const element type stored in the container
iterator/reverse_itertor: A type for iterating in order/reverse-order over the elements of the container
const_iterator/const_reverse_iterator: A type for iterating over const elements of the container
_Myt/_Mybase: Its type/its base type
allocator_type/_Alty: The former is provided for the sake of conforming to the standard, the latter is used internally for memory management. (_Alty most probably stands for ALlocator TYpe).

The vector template specialization with Boolean elements will probably not be discussed and is left as an exercise for the reader. Anyway, let’s just forget it for now.

It might be hard to digest all of these types if this is your first time. Once you go through the following example, things will be a little bit clearer. Suppose we have the following instance:

C++

std::vector myvector;

Then, its type would be _Myt = vector<int, allocator>, and derived from the following type:
_Mybase = std::_Vector_alloc<false,std::_Vec_base_types<int,std::allocator > >.

Nothing mysterious about the second type parameter. For the first template parameter, remember that it is not a type, it is a value of the same type as is_empty<_Alloc>::value. is_empty<_Ty> is a class template defined in the type_traits source file as follows:

C++

#define _IS_EMPTY(_Ty) : _Cat_base<__is_empty(_Ty)>
template
struct is_empty _IS_EMPTY(_Ty) {};

It is just a type with no members that is derived from _Cat_base template which is defined in the xtr1common source file as follows:

C++

template
struct _Cat_base : false_type {};

template<>
struct _Cat_base : true_type {};

The first definition is a template with a value parameter of type bool. The second is a template specialization of the first, with the value true. The important thing here is that the first is derived from false_type while the second is derived from true_type. In other words, if the value parameter is false, then is_empty would be derived from false_type, otherwise it would be derived from true_type. What is the thing that is being checked here? It comes from the _IS_EMPTY macro: __is_empty(_Ty). Fine, but what is _Ty? By taking a look at the declaration of the vector template, we notice that it is _Alloc, which is by default allocator<_Ty>. One more thing: what the heck is __is_empty(allocator<_Ty>)? You will not find the declaration of this thing anywhere in the standard library. __is_empty is an internal compiler keyword that is evaluated at compile-time. It takes a type and returns true if the type has no instance fields, and false otherwise. You can actually use it in your code, but you shouldn’t because it is designed to be used only by the standard library and might change in the future. So what would be the value of __is_empty(allocator<_Ty>)? It is equivalent to the question: does allocator<_Ty> have instance fields? By examining the source code, we notice that it does not have any instance fields. This means that __is_empty will return true and therefore true_type will be the grandparent type of is_empty. true_type has the following field:

C++

static const bool value = true;

This means that is_empty<_Alloc>::value is true, and its negation is false. _Mybase follows from this.

Why do we need to know whether _Alloc has instance fields or not? If you take a look at the declaration of _Vector_alloc, you will see that it is a template and it has a specialization. The template is used when the allocator has instance fields, and the specialization (the one used in this example) is used when the allocator does not have instance fields. The key difference is that the template defines an instance field of type _Alloc, while the specialization doesn’t. In other words, if the allocator has instance fields then it would be necessary to have and keep track of an instance of it, otherwise, there is no need to keep an instance. In each of these two cases, the code will be slightly different and therefore two types have been written to handle each case. For now, let’s keep it simple a stay away from the details of allocators.

This concludes the introduction. There are some important things that I have glossed over. What is the base type all about? What’s this allocator thing? And where the heck is the actual implementation of the vector? Well, I will answer these questions in detail in subsequent articles.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Written By

Hadi Brais

Syrian Arab Republic

This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Dissecting the C++ STL Vector: Part 1 – Introduction

License

Comments and Discussions