Introduction
This is an attempt to explain new && reference present in latest versions of compilers as part of implementing the new C++ 11 standard. Such as those shipping with Visual studio 10-11-12 and gcc 4.3-4, or beautiful fast( equally if not more) open-source alternative to gcc Clang.
Why you should start using it ? In short : Nontrivial performance gain.
For example inserts to std::vector (or in fact any array creation) will not cost huge amount of allocs/copies anymore.
But before we go to detail of new c++ 11 "Move semantics" we need to understand the core of problem. We need to understand why performance problem of c++ language with = assign operation often resulting to useless alloc/copy exists.
We live in the world where we need to work with a lot of data. We need to store and process it in an effective manner. Problem with c and c++ is that we got used to do it ineffectively.
a = 1
And since in natural language we think about "assigning" data to arrays too.
It was only natural and could be expected that we continued to assign to arrays in C language the same way.
textures[1] = new Texture(640*480*32,"cube.jpg");
The real price for copying via = was small since pointers are just numbers (containing memory addresses). But with objects the story is different.
How the performance problem started
Now I will try to keep this short but it is important to really understand why && operator was born.
In C++ working with references become prevalent since they are safer alternative to pointers and very importantly new language features such as automatic call of constructors or destructors (automatic cleanup) or operators worked only with them.
Now make no mistake References are internally still just pointers but this time made little bit safer and automatically dereferenced. Period. How are they made safer? Well. They can never contain invalid data since the only assignment to them is allowed during their declaration and only to existing statically declared data. But whenever you pass reference to function in fact it is still just pointer that is internally pushed on stack.
void proc( Texture * texture ) { Save(texture); } void proc( Texture & texture ) { Save(texture); }
C++ wanted to make our life easier by doing routine pointer de/referencing for us (and hiding this pointer wizardry from us). Illusion of passing or working with objects instead of pointers to them was so perfect that many lost sense what is actually reference and what is object.
Exactly this ambiguity had unfortunate side effect that led us into believing that this
Texture texture[1000];
texture[1] = Texture("sky.jpg");
is just is safer alternative to this
Texture * texture[1000];
texture[1] = new Texture("sky.jpg");
We just got rid of unsafe pointers in favor of safer references like everywhere else.But more importantly this has given us automatic call of constructors destructor and operators.
Right?
Wrong. There is no such thing as array of references in C++.
No pointer magic behind the scene is going on this time like it is within functions.
So what we actually created is algorithmically very bad decision.
We created an array of objects stored by value. And no. Removing * from declaration doesn't automatically make pointer variable reference.
From performance point of view there is really no alternative to array of pointers . With big objects containing statically declared structured data there is big performance difference when creating sorting searching and mainly reallocating 100mb array of values. Then it is with few bytes of pointers to them. It pretty much kills the possibility to work within processor cache with as much data as possible. So we sort/search/reallocate etc on the speed of main memory order of magnitude slower instead of cpu cache which we now uselessly pollute with unimportant object data. With objects containing big dynamic data situation is better when sorting etc. But still bad when assigned to array. Since then we need to allocate and copy large chunks of dynamic mem plus rest of object.
But I would say mentioned reallocation is worst consequence of storing object by values.
So every time you think of "nah lets put vector<large-object> there".
Your memory requirements will be twice of what they need to be and your performance abysmal due to fact that whole allocated mem will be uselessly moved around after each realloc. One would think that this is just price we indeed agreed to pay for generic approaches and laziness.
But by my opinion this is the price for storing objects instead of pointers for the oo reasons (automatic constructors destructors operators) mentioned above.
But back to the topic. As we remember assign = "actually" copies data. And this leads us to another big performance problem with arrays.
How to efficiently build array of large objects.
Suppose you wanna build a city full of skyscrapers and you obviously (due to large scale) can't afford any waste of time or resources.
Now think of city as an array analogy. So what you actually do is you "create" building inside city
- you allocate large space for building inside of city
- and obviously you just build building in this space.
But thanks to our habit of using = operator to store object pointers to array in good old C.
Its only natural that we attempt to use the same "create and assign" paradigm with references too; In other words. We simply got used to it and what's worse every c++ book teaches as to do it this ineffective way.
Skyscrapper city[1000];
city[1] = Skyscrapper("Empire State Building");
So instead of "create inside array" which we learned from city analogy.
- you allocate large temporary space outside city // = Skyscraper();
- you build skyscraper in this temporary space
- you allocate the same large space inside city // city[1].operator = (Skyscraper &in)
- you use large amount uf trucks gas and time to move this skyscraper to space in city
Now worst is the useless copy part. Why? Its an order of magnitude slower than all the steps combined. Because when usually even the largest objects are created this large memory is merely preallocated. That means usually no useful data are yet in them. Yet by forcing copy operation we are forced to move them byte by byte no matter whether data is valid or not. The larger the objects own+dynamic memory the larger the performance drop. How big is the performance drop ? Now It would seem unbelivable but as we speak
every usual object assigned by reference and not by pointer in c++ suffers this and assigning to arrays is worst. just check
benchmark results in middle of article.
That is pretty much most of the code around you. Go ahead an check nearest code
It manifests with arrays so strongly because of sheer number of ineffective assignments. In case of our benchmark its 500 assignments but if you do any reference assignment that can be handled by moving (as explained later) and not copying in loop with 500 iterations you have basically the same problem.
But back to the arrays. So are there ways in c++ to store object to array effectively ?(damn... I got used to it too . I mean create object in a array effectively) without wasted cpu and mem?
Since as you seen in benchmark. The larger the objects are the more it matters.
Yes there are but they are nor intuitive or obvious and are hack like in nature.
Now if C++ did allow us to invoke specialized constructor and create objects using already allocated space inside array(that was allocated just once for all elements by one efficient alloc).
Then this could saved zillion of allocs/copies most of us usually do by assigning new objects via = ;
Skyscrapper city[1000];
city[1]("empire state building")
Still. There are some ways to do it.
You can for example move all your initialization code to method and invoke it on array element explicitly.
Skyscrapper city[1000];
city[1].create("empire state building");
- you allocate large space for building inside city
- you build building in this space.
Hurray. The problem introduced by using = is gone.
That means now it doesn't matter if object have large statically declared array or structure. No copy no problem.
Most importantly the wasted cpu on moving mostly useless empty bytes is gone.
Also positive is the fact that reading and writing such a large chunk of memory
which pretty much flushed all cpu caches bringing down performance of the rest of the application is gone too.
That's all nice and great. But chances that people will stop putting code to constructors in favor of some standard create method are pretty slim.
Using constructors to create everything is paradigm that we got so used to and love.
Exactly as we got trained by misleading books and used = operator for storing data to arrays.
Still.
There is way to do it with constructor via little known variant of operator new Its called "placement new" where you construct object on existing memory with this
pointer provided by you.
But now we are entering very weird confusing and little bit dangerous territory due to word new flying around statically declared array. New that doesn't allocate anything. New that is here just as a kludge to invoke constructor.
Why dangerous? The moment you overload something as fundamental as allocator New brace yourself for all kind of troubles http://www.drdobbs.com/article/print?articleID=184401369&dept_url=/cpp/
#include <new>
Skyscrapper city[1000];
new (&city[1]) Skyscraper("empire state building"); city[1].~Skyscraper();
new (&city[1]) Skyscraper("renovated empire state building");
city[1].~Skyscraper();
It's unnatural and this time problematic plus very confusing too.
But Storing objects was always very bad idea anyway as you can see in
sorting results in benchmark bellow. So what about returning to storing just pointers? As we mentioned above no oo goodies for pointers.
{
vector<Texture*> textures(100); textures[1] = new Texture(640*480*32,"cube.jpg"); }
Most importantly when vector goes out of scope no destructors are automatically called. it can be done manually but it reintroduces source of bugs.
C++ could solve it by introducing scoped version of new. like new_local. In such case compiller would generate calling destructors when leaving scope exactly as it is doing today with automatic objects.
Update: I attempted to implement it in
this articleIs automatic cleanup of objects stored by pointers really that impossible in current c++?
Now consider following weird but perfectly working example. Remember stack is defaultly limited (unless you change it in linker settings) resource. So take this as purely academic example that array of pointers that automatically calls destructors when going out of scope is possible.
struct A{
int no;
operator A* () { return this; } };
void array_of_objects_stored_by_pointers_with_autocleanup() {
A* a[10000]={0}; a[7] = A(); a[7]->no=7; a[4] = A(); a[4]->no=4; }
The moment array of objects stored by pointers goes out of scope they are automatically deallocated without any manual destructor invocation;
How come this works ? What is going on?
= A() is internally the same as = new A();
the same constructor is invoked. Except for first stack is used by allocator and heap for second.
both return pointers. references are pointers(just meeting certain criteria to deserve label reference) as we remember right ?
Yes for heap pointers (created by new) there is kludge of wrapping all pointers to objects simulating pointers via operator overloading aka(smart pointers) in std::shared_ptr and alike. and store just those. So if you dont mind wrapping all your variables to functionality hiding impossible to debug macros/templates then this is very good solution for you.
But I strongly believe that simple things shall not be encrypted or hidden from sight nor does every variable.
Programmer must be aware of what is going on as much as possible like it was in C. without having template and macro expander build in his head.
And if you ask Linus to obfuscate every pointer to template wrapper he would most probably kill you.
I remember strong backlash against excessive macro usage in C. And there was rational reason for that.
That reason was "complexity and hiding code logic is source of bugs".
The possibly best solution is to fix C++ compillers = operation
After all those attempts to store anything via = without waste I think that compiler should do "placement new" transparently for us whenever he encounters = on movable object. Instead of forcing us to implement zillion operators that solve only heap side of problem. Deciding what kind of "this" object should use is easy since static analysis deciding what object is movable is already part of new compilers as part of support for &&.
Skyscraper city[1000]; city[1] = Skyscraper("Empire State Building");
So this internally can be optimized (via explicit optimization switch) to something like
Skyscraper city[1000]; try { new (&city[1]) Skyscraper("Empire State Building");
} catch(...) { new (&city[1]) Skyscraper() throw; }
This would fix dynamic and static(object mem) waste = zero alloc/copy since elements are created just once in already allocated memory as it always should had been for performance reasons.
Why is static(non-dynamic) mem waste equally if not more important? Majority of objects are small and selfcontained. And when you look at benchmark bellow storing object containing statically declared array took 5662 ms yet storing object containing dynamic array of the same size took 1372 ms.
Also. After such change to compiler all old code using big objects would start pretty much flying at completely different speeds just by recompiling.
Because I am curious person I am attempting to implement and test it in clang fantastic open source c++compiler as a optimization switch or pragma. Should you wanna lend a hand I will be more than thankful
http://clang-developers.42468.n3.nabble.com/Proposed-C-optimization-with-big-speed-gains-with-big-objects-tt4026886.html
But let's focus on latest C++ solution to it (unfortunately only for heap mem in your objects and with a lot of code changes)
The new C++ 11 Move semantics
Move semantics enables you to write code that transfers dynamically allocated memory from one object to another. Move semantics works because it enables this memory to be transferred from temporary objects(by copying just pointers) that cannot be referenced elsewhere in the program. Unfortunately large statically declared (contained-within object) arrays/structs/members data must still be uselessly copied since as mentioned they are contained within temp object themselves that is about to be destroyed.
To implement move semantics, you typically provide a move constructor, and optionally a move assignment operator= to your class. Copy and assignment operations whose sources are (temp objects or data that can't change) then automatically take advantage of move semantics. Unlike the default copy constructor, the compiler does not provide a default move constructor.
You can also overload ordinary functions and operators to take advantage of move semantics. Visual C++ 2010 introduces move semantics into the Standard Template Library (STL). For example, the string class implements operations that perform move semantics. Consider the following example that concatenates several strings.
string s = string("h") + "e" + "ll" + "o";
Before && references existed, each call to operator+ allocated and returned a new temp object. operator+ couldn't append one string to the other because it didn't know whether content of the source can be tampered with (temps) or not (variables). If the source strings are both variables, they might be referenced elsewhere in the program and therefore must not be modified.
But now thanks to && reference we now know that temp (which cannot be referenced elsewhere in the program) was passed in. Therefore, operator+ can now safely append one string to another. This can significantly reduce the number of dynamic memory allocations that the string class must perform.
Move semantics also helps when the compiler cannot use Return Value Optimization (RVO) or Named Return Value Optimization (NRVO). In these cases, the compiler calls the move constructor if the type defines it.
As an another example consider the example of inserting an element into a vector object. If the capacity of the vector object is exceeded, the vector object must reallocate memory for its elements and then copy each element to another memory location to make room for the inserted element. When an insertion operation copies an element, it creates a new element, calls the copy constructor to copy the data from the previous element to the new element, and then destroys the previous element. Move semantics enables you to move objects directly without having to perform expensive memory allocation and copy operations.
So. To take advantage of move semantics to allow efficient insert of your objects in the std::vector, you must write a move constructor to allow moving of data from one object to another.
So let's see what is going on within our usual inefficient city copy operator = example but with move operator = implemented
Well. Additionally to your
copy operator = (&) where you always just copy assigned variable data
Now you define additional
move operator = (&&) that will be called instead when data that canot change (such as temp object created when assigning to array) is passed in.
Skyscraper city[1000];
city[1] = Skyscraper("Empire State Building");
- you allocate large space for building outside of city
// notice = Skyscrapper();
- you build building in this space.
- you just mark this already created building as part of city // no trucks (copying) needed
void operator = ( Skyscraper && in ) { mem = in.mem; size = in.size; in.mem = 0; }
~Skyscraper() { if(mem) delete mem; }
Hurray no allocate/copy from temp object = finally no cpu is wasted and cache trashed
Memory allocated and initialized by temp is not released since it has new owner.
For complete compilable example copy benchmark code bellow to dev env of your choice.
No for those who thing everything was clear and obvious in previous example.
Dont' let the eyes fool you.
void operator = ( Skyscraper && in ) { next_proc(in); next_proc((Skyscraper &&)in); }
Skyscraper && in is not actually of type && anymore. The moment it enters function its & again. So
if you wana forward && to another function you need to cast it to && again (in stl via std::move). Why c++ decided to do this behind your back hidden functionality ? Well I am being told that it's security precaution. That any && having name is in risk of being referenced somewhere else in code and thus it's not deemed safe for keeping "moveable" status. Seems like some unfinished c++ business to me since I can't imagine referencing local variable outside of this proc.
Also there is little know feature of ref-specifiers where you can restrict operator/methods to accept just temps or just variables.
struct string {
string& operator=(string const& other) & { }
};
Now, you can't anymore say
string() = "hello";
Unfortunately this doesn't yet seem to be supported in Visual Studio 2012 RC1 that I am using right now.
So to summarize. Unless you use contained big statically declared (stored within object) structures/arrays/members the result is significant speedup of your existing code. Tho see how much you can speedup your existing code (well... actually you stop slowing it down)I created simple practical && example along with benchmark results. But if you do more than just stupid memset in your constructors/destructors speedups will be significantly higher.
Benchmark Results:
- store objects containing array themself took 5662 ms // even with && this is still problem
- sort objects containing array themself took 17660 ms //this is why you should not store objects
- store objects containing dynamic array by copying took 1372 ms
- store objects containing dynamic array by moving (c++ 11) took 500 ms
- store just pointers to objects took 484 ms
- sort just pointers to objects took 0 ms
Benchmark Code
To have an idea how bad the usual careless assign = is.
I created example storing 500 large objects to array via different methods and measured time it takes.
Texture represents standard large object we work in c++ on daily basis = just large chunk of data and its size
plus mandatory operator = to be able to be stored by value. Now it's stripped to bare minimum on purpose(no chaining consts etc) . with only simple types so you can focus only on those two operators. And sort has < reversed to simulate worst case scenario.
#include <stdio.h>
#include <stdlib.h>
#include <windows.h>
#include <algorithm>
using namespace std;
struct Texture { int size; int color; void* mem;
Texture() : size(0), color(0), mem(0) {}
~Texture() {
if(mem) delete mem; mem = 0;
}
Texture ( int Size, int Color ) {
mem=new char[size=Size];
memset(mem,color=Color,size);
}
void operator = ( Texture & in ) { color = in.color;
mem = new char[size=in.size]; memcpy(mem,in.mem,in.size); }
};
struct Texture2 : virtual Texture {
void operator = ( Texture && in ) { color= in.color;
mem = in.mem; size = in.size; in.mem = 0; }
};
struct Texture2StaticArray : Texture2 { Texture2StaticArray() : Texture() {} Texture2StaticArray( int Size, int Color ) {
memset(static_array,color=Color,sizeof(static_array));
}
char static_array[640*480*8];
};
#define TEXTURES 500
void store_objects_containing_static_array() {
Texture2StaticArray* texture =new Texture2StaticArray[TEXTURES];
DWORD start = GetTickCount();
for(int i=0;i<TEXTURES;i++) {
texture[i] = Texture2StaticArray(0,i);
}
printf("\nstore objects containing static array took %d ms", GetTickCount()-start );
start = GetTickCount();
sort(texture,texture+TEXTURES, [] (Texture2StaticArray& a, Texture2StaticArray& b) { return a.color > b.color; } );
printf("\nsort objects containing static array took %d ms", GetTickCount()-start );
delete [] texture;
}
void store_objects_containing_dynamic_array_current_standard() {
Texture texture [TEXTURES];
DWORD start = GetTickCount();
for(int i=0;i<TEXTURES;i++) {
texture[i] = Texture(640*480*8,i);
}
printf("\nstore objects containing dynamic array by copying took %d ms", GetTickCount()-start );
}
void store_objects_containing_dynamic_array_new_standard() {
Texture2 texture [TEXTURES];
DWORD start = GetTickCount();
for(int i=0;i<TEXTURES;i++) {
texture[i] = Texture(640*480*8,i);
}
printf("\nstore objects containing dynamic array by moving (c++ 11) took %d ms", GetTickCount()-start );
}
void store_pointers_to_any_object() {
Texture* texture [TEXTURES];
DWORD start = GetTickCount();
for(int i=0;i<TEXTURES;i++) {
texture[i] = new Texture(640*480*8,i);
}
printf("\nstore just pointers to objects took %d ms", GetTickCount()-start );
start = GetTickCount();
sort(texture,texture+TEXTURES, [] (Texture* a, Texture* b) { return a->color > b->color; });
printf("\nsort just pointers to objects took %d ms", GetTickCount()-start );
for(int i=0;i<TEXTURES;i++) { delete texture[i];
}
}
void main() {
store_objects_containing_static_array();
store_objects_containing_dynamic_array_current_standard();
store_objects_containing_dynamic_array_new_standard();
store_pointers_to_any_object();
Sleep(-1);
}
Now the more observable of you would probably would start arguing...
"This is nothing new I could do this data "moving" (or just passing data along between objects without copying) the same way in current standard c++ operator = & so why do I need new && operator ?
Yes you can and No you can't. If you did moving in operator = & like this. Imagine what would happen
... void operator = ( const string & in ) {
mem = in.mem; size = in.size;
in.mem = 0;
}
...
string a("white"),b("yellow");
string c=b; ...
b="gotcha..."
If we moved = copied just pointers to data in standard operator = &
Then whenewer b changes c changes too;
And this was not intended
- so we actually wana make copy when assigning from data that can change
- and we just copy pointers to data that we are sure will not change
Unfortunately & up to c++ 11 could not distingush if passed data can change
so moving was not possible in current c++ standard for the reasons explained in c=b example.
the new && in turn can distingush that data which cant change was passed in and thus its safe
just point to its data and skip copying.
So to summarize.
in new c++ 11 standard you are now supposed to keep two sets of operators and constructors
- operator = and constructor taking & where you copy from data that can change (variables etc,)
- operator = and constructor taking && where you just point to data that will not change and save mem and cpu by skipping copying (temp objects,etc)
Unfotunately that means you will now have to implement two sets of pretty much every operator under the sun that you declared with generic copy/paste like code yet still fixing only heap side of the performance problem.
So reconsider using = on objects at all.
At least until compiller writers fix heap and static mem waste by doing internal placement new for = on movable objects.
string b(a); string c=b; string texts[10];
texts[1]= string("white fox");
Why it's called rvalue reference &&
Now the whole article I was deliberately was not using termins like rvalues(cant change) and lvalues(can change) since they are not what their names imply.
lvalue should had been named something like "variable"
rvalue should had been named something like "temp"
So whenever you read text using lvalue and rvalue just replace those two and sudenly text will make sense
They are just technical grammar remnants confusing people.
the were born from how C grammar in lex and yacc was described eg on what side of
"that particular grammar rule they vere located = left or right" BUT that particular rule can be part of larger expression and lvalue is sudenly rvalue.
Or let me explain it this way.
Anything not having name is rvalue otherwise it's lvalue
Take care